Global genomic architecture in diploid Gossypium
To discover the genomic components responsible for genome size variation in Gossypium, we generated sequence data from whole genome shotgun (WGS) libraries for three representative diploid members that range 3-fold in DNA content and one outgroup species, Gossypioides kirkii (Hawkins et al. 2006, in press) . Approximately 0.2% of the haploid genome from each species was sequenced, resulting in a total of almost 12 Mb of sequence information.
Taxon/genome group |
Genome size in Mb |
# clones in lib. |
Successfully sequenced |
Average read (bp) |
% genome sequenced |
#Mb sequenced |
Gossypioides kirkii Outgroup |
588 |
1920 |
1535 |
753 |
0.20 |
1.15 |
Gossypium raimondii D genome |
880 |
3072 |
2815 |
770 |
0.25 |
2.17 |
Gossypium herbaceum A genome |
1667 |
6048 |
4994 |
704 |
0.21 |
3.52 |
Gossypium exiguum K genome |
2460 |
10368 |
6980 |
704 |
0.20 |
4.91 |
|
|
|
|
|
Total |
11.75 |
Sequences were queried against GenBank using BLASTX and against each other using BLASTN, and repetitive sequences were subsequently classified into gypsy-like, copia-like, LINE-like, Mutator-like, hAT-like, En/Spm-like, tandem repeats, and unknown repetitive classes. Copy numbers for each class were estimated using a novel modeling approach.
Congruent with results from plant taxa studied to date, we found that the majority of the Gossypium genome consists of dispersed repetitive sequences. Copy number and density estimates including all dispersed repeats indicate that a minimum of 40-65% of each genome is composed of transposable elements. In agreement with results from other well studied taxa, the majority of this repetitive fraction consists of Class I retrotransposons, particularly gypsy-like sequences. Class II DNA transposons comprised only a minor fraction of the Gossypium genomes (~2%) (Fig. 1). Additionally, there was no significant variation in copy number among tandem repeats.
FIGURE 1

A key conclusion of this analysis is that different types of repetitive sequences have accumulated at different rates in different plant lineages. Excellent examples of this are illustrated by the gypsy-like sequences designated “Gorge” for Gossypium retrotransposon gypsy-like element. Phylogenetic analysis of 373 gypsy-like reverse transcriptase sequences assembled from the four WGS libraries revealed three distinct classes, designated Gorge1, Gorge2, and Gorge3 (Fig. 2). Gorge1 is similar to Arabidopsis thaliana gypsy sequence Athila, Gorge2 is similar to maize Cinful, and Gorge3 is similar to dea1 from Ananas comosus and del1-46 from Lilium henryi. Copy number calculations for the three types of sequences revealed relatively stable copy numbers for Gorge1/em> and Gorge2 across all four species, but a profound increase in copy number of Gorge3 in the larger-genome species, suggesting that differential, lineage-specific amplification of transposable elements not only occurs among different repetitive families, but also among different clades of elements within each family of retrotransposons.
FIGURE 2
