Comparative Evolutionary Genomics of Cotton
Evolutionary history of the genus Gossypium. As Gossypium diversified and colonized arid regions of the globe it underwent extensive chromosomal evolution. Presently, eight diploid genome groups (A through G, plus K) are recognized among the 50 species. Relationships within and among the various genome groups have been addressed in detail. Genome groups largely correspond to monophyletic lineages. This i nformation has been embodied in our synthesis of relationships and of genome evolution (Fig. 1), which reveals four major lineages of diploid species corresponding to three continents.
Polyploidy and genome evolution in Gossypium . Polyploidization is a prominent process in plant evolution, and is important in crop plants as well. Both G. hirsutum and G. barbadense are classic allopolyploids, tracing to a seemingly improbable hybridization of an African/Asian A-genome and an American D-genome species ~1-2 mya. Each has a large indigenous range and encompasses myriad morphological forms that span the wild-to-domesticated continuum. Polyploidization is widely perceived to provide raw material for evolutionary novelty; in this respect genome doubling in cotton may have offered novel opportunities for agronomic improvement through human selection. Because modern tetraploid cotton has lint characteristics that are non-additive relative to their diploid progenitors (Fig. 2), polyploidy per se has been critical to the productivity and quality of modern cotton. We recently developed a novel microarray platform for studying the contributions of two co-resident genomes to cotton development, which promises to provide insight into the evolutionary and agronomic significance of genome doubling.

Fig. 1. Phylogenetic framework and genome size variation. Gossypium contains 45 diploid (n = 13) and five allopolyploid (n = 26) species, the latter with two genomes, A and D, from species in different hemispheres. A- and D- genome groups diverged ~ 5-10 MYA. Allopolyploid cottons originated following trans-oceanic dispersal of an A-genome progenitor to the New World. Arrows indicate taxa representing the best living models of the species involved in allopolyploid formation. Allopolyploids diversified during the Pleistocene into three lineages, of which two include the commercially important species, G. hirsutum (upland cotton) and G. barbadense (Pima cotton), each domesticated within the last 4000-7000 years.
Cotton fiber diversity and evolution. As Gossypium diversified so did seed and trichome morphology. Extant species exhibit extraordinary variation in seed size and in the length, color, and density of the layer of single-celled trichomes on the seed surface (Fig. 2). Seed coverings range from nearly glabrous (e.g., G. klotzschianum and G. davidsonii), to short (several mm) stiff, brown hairs that aid in wind-dispersal (G. australe, G. nelsonii), to the long, fine white fibers that characterize improved forms of the four cultivated species. In all but the modern forms of the domesticated species, seed fibers are adherent to the seed coat. As noted above, a parallel domestication involved two species from the Americas, G. hirsutum and G. barbadense, and two from Africa-Asia, G. arboreum and G. herbaceum. In each case, aboriginal peoples discovered thousands of years ago that the unique properties of cotton fibers made them useful for ropes, textiles and other applications. A notable aspect of this history is that similar domestication processes resulted in apparently similar morphological transformations, including decreased plant stature, loss of photoperiod sensitivity, loss of seed dormancy, and most notably, a dramatic increase in the abundance, length, and quality of seed epidermal fiber (Fig. 2). Gossypium hirsutum presently accounts for >90% of the global cotton crop, having spread from its ancestral home in Mesoamerica to over 50 countries in both hemispheres. To understand developmental differences that account for fiber length variation and to place these differences in a phylogenetic context, Applequist et al. conducted SEM of ovules at and near the time of flowering, and generated growth curves. They showed that variation in mature morphologies reflects diversity during expansion, secondary wall synthesis, and maturation. Developmental profiles of the fibers of most wild species are similar, with fiber elongation terminating at ~14 days post-anthesis. In contrast, growth is extended to ~21 days in the A-genome and F-genome diploids. Prolonged elongation is thus phylogenetically (Fig. 1) revealed as a key evolutionary step in the origin of spinnable fiber, prior to domestication and in Africa. Our recent comparative expression profiling work suggests that the evolutionary transition leading to spinnable A-genome fiber involved a prolongation of an ancestral developmental program, as well as a novel metabolism involving enhanced regulation of reactive oxygen species.
Fig. 2. Variation in seed trichome
(fiber) morphology in wild and
domesticated cottons. Gossypium
seeds exhibits remarkable
variation among the ~50 wild and
domesticated species. Illustrated
are examples from wild species at
both the diploid and allopolyploid
levels. Early stages in fiber
initiation are similar in all species,
but developmental variation during
primary and secondary wall
deposition lead to radically
different mature morphologies
(Taxa studied here include
domesticated and wild G. hirsutum
(Cult. and Wild AD1, respectively)
and G. barbadense (not shown)).
Domestication itself has been associated with further elongation at both the diploid and allopolyploid levels. This provokes speculation that the effects of parallel selection for long fiber in the cultivated species resulted in a genetically convergent or parallel transformation in the developmental program. Ongoing work using a microarray platform capable of distinguishing homoeologous transcripts, is testing this notion. Analyses also indicate a high level of novel expression of D-genome genes during fiber development in allopolyploid cotton. It may be that allopolyploidization provided the raw material necessary for the evolution of novel gene expression patterns, which subsequently were exploited by the aboriginal domesticators (and perhaps modern plant breeders) of G. hirsutum and G. barbadense. Superimposing the morphological/developmental variation and the multiple, parallel domestications on the organismal framework (Fig. 1) identifies the key transformations in cotton fiber evolution and improvement. This perspective provides the foundation for the proposed expression profiling experiments on phenotypically selected introgression lines.
Experimental Design – One of the exciting opportunities stimulated by the convergence of modern genomic approaches with other areas of biology is that of resolving the enigmatic processes by which new phenotypes arise. Here we propose a multifaceted program designed to further our understanding of the complex genetic architecture that underlies form, and to elucidate biological processes involved in developmental, agronomic, and evolutionary change. Using a well-developed model system from the cotton genus (Gossypium) and new genomic resources, we will reveal steps and complexities involved in transforming primitive trichomes to the economically important fibers of modern cotton cultivars. This experiment promises insight into fundamental biological processes underlying fiber development and evolution, while providing vital resources for cotton improvement. The research involves four interrelated components:
Component 1: Develop and characterize immortal introgression populations, to reduce complex morphology into defined constituents amenable to functional genomic analyses
- Allows for dissection of the stages involved in transforming primitive trichomes to the economically important fibers of modern cotton cultivars
- Associate important cotton phenotypes with gene expression
- Offer a valuable resource for identification of high-likelihood candidates for QTLs for virtually any trait
- Evaluate the comparative genomics of cotton domestication
Component 2: Developing an enriched EST resource for expression profiling, gene discovery, and SNP discovery
- Expand existing EST collection by generating ~500 Mb of new reads using 454 sequencing of cDNAs from A- and D- genome diploids and AD-genome allopolyploids (including both wild and domesticated forms of the two most important cultivated species, G. hirsutum and G. barbadense)
- Following global assembly, create database useful for design and use of homoeolog-specific expression profiling platforms. Goal is to identify diagnostic SNPs for both homoeologs (A-genome and D-genome) for >50% of the genes in the genome
Component 3: Study perturbations in genetic networks and gene expression associated with naturally-occurring variation in fiber phenotypes using the introgression lines
- Dissect complex phenotypic differences between species and between wild and domesticated forms, through the use of near-isogenic introgression lines in conjunction with phenotypic and genetic analysis. We will select 10-15 lines (representing 10-15 chromosomal segments) from each of the four NIL populations (40-60 lines total) that vary in key fiber quality components.
- Lines will be genetically evaluated using homoeolog-diagnostic expression profiling methods to reveal expression changes caused by introgression of chromosome segments that result in important changes in fiber phenotypes.
- By comparing effects among populations, we will evaluate whether the twin domestications of G. barbadense and G. hirsutum led to parallel changes in genetic architecture underlying fiber morphology and development.
- Similarly, through comparisons involving homoeologous regions (within and between populations), we will explore the effects of homoeology on gene network perturbations and key innovations in fiber cells under domestication.
- We will quantify the nature of the cis and trans effects on phenotype through mapping a subset of the differentially expressed genes in the NILs from the wild vs. domesticated G. hirsutum cross. Key genes will also be examined for evidence of selection (see objective 4).
Component 4: Provide the foundation for understanding the effects of selection on genetic diversity in cotton.
- Use a population genetic approach (screens of genome-wide genetic diversity) to connect phenotypic and genetic data to the process of domestication and to reveal the targets of artificial selection.
- We will estimate nucleotide diversity for both randomly sampled (with respect to genomic location) and targeted regions (based on phenotypic and genetic analysis of the G. hirsutum NIL population) in a panel of accessions spanning the wild to domesticated continuum. This will shed light on levels of nucleotide variation throughout the genome and as well as the portion and proportion that has been captured in modern cultivars.
- This analysis will serve as the context for evaluating regions of reduced diversity, in conjunction with evidence from phenotypes, expression analyses, homoeolog-specificity, and population genetic bottlenecks. These data also will set the stage for future design of association mapping experiments.
- Evaluate the previous mysterious observation that within the allopolyploid nucleus, D genome homoeologs accumulate diversity at a higher rate than do A genome homoeologs.