De novo transcriptome assembly of polyploid organisms: insights from working with diploid and tetraploid wheat

Wheat (Triticum spp.) was domesticated at the dawn of agriculture ~10,000 years ago and has since been adapted to grow in different environments throughout the world. Today, wheat remains among the three most cultivated crops and is a major source of calories and protein for the human population. Cultivated wheat varieties were derived from two species, hexaploid Triticum aestivum (bread wheat) and tetraploid Triticum turgidum (durum or pasta wheat). Similar to many other grasses, wheat originated from interspecific hybridization events that resulted in the AABB genome of T. turgidum and the AABBDD - of T. aestivum. The A genome was contributed by the wild diploid species T. urartu, the B genome by  an extinct Aegilopsspecies from the Section Sitopsis and the D genome by Aegilops tauschii.  Wheat belongs to the Poaceae family (also called Gramineae or true grasses).Other prominent grasses with sequenced genomes include Oryza sativa (rice) and Zea mays (corn). The total size one diploid copy of wheat genome is estimated to be around 6000 Mb; it is 15 times the size of a rice genome and 50 times the size of Arabidopsis genome. Despite the large size of the genome, gene-coding region of wheat is comparable to other grasses and constitutes only 1-2% percent of the total wheat genome.

During this study, we have sequenced and assembled transcriptomes of a tetraploid Triticum turgidum and its closely related diploid relative Triticum urartu. We have examined several parameters affecting the assembly of polyploid wheat compared to the diploid wheat and incorporated additional steps necessary for the genome-specific assembly.