Dataset Information

The Genomes of Oryza sativa: a history of duplications.

ABSTRACT: We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000-40,000. Only 2%-3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.

SUBMITTER: Yu J

PROVIDER: S-EPMC546038 | biostudies-literature | 2005 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The Genomes of Oryza sativa: a history of duplications.

Yu Jun J Wang Jun J Lin Wei W Li Songgang S Li Heng H Zhou Jun J Ni Peixiang P Dong Wei W Hu Songnian S Zeng Changqing C Zhang Jianguo J Zhang Yong Y Li Ruiqiang R Xu Zuyuan Z Li Shengting S Li Xianran X Zheng Hongkun H Cong Lijuan L Lin Liang L Yin Jianning J Geng Jianing J Li Guangyuan G Shi Jianping J Liu Juan J Lv Hong H Li Jun J Wang Jing J Deng Yajun Y Ran Longhua L Shi Xiaoli X Wang Xiyin X Wu Qingfa Q Li Changfeng C Ren Xiaoyu X Wang Jingqiang J Wang Xiaoling X Li Dawei D Liu Dongyuan D Zhang Xiaowei X Ji Zhendong Z Zhao Wenming W Sun Yongqiao Y Zhang Zhenpeng Z Bao Jingyue J Han Yujun Y Dong Lingli L Ji Jia J Chen Peng P Wu Shuming S Liu Jinsong J Xiao Ying Y Bu Dongbo D Tan Jianlong J Yang Li L Ye Chen C Zhang Jingfen J Xu Jingyi J Zhou Yan Y Yu Yingpu Y Zhang Bing B Zhuang Shulin S Wei Haibin H Liu Bin B Lei Meng M Yu Hong H Li Yuanzhe Y Xu Hao H Wei Shulin S He Ximiao X Fang Lijun L Zhang Zengjin Z Zhang Yunze Y Huang Xiangang X Su Zhixi Z Tong Wei W Li Jinhong J Tong Zongzhong Z Li Shuangli S Ye Jia J Wang Lishun L Fang Lin L Lei Tingting T Chen Chen C Chen Huan H Xu Zhao Z Li Haihong H Huang Haiyan H Zhang Feng F Xu Huayong H Li Na N Zhao Caifeng C Li Shuting S Dong Lijun L Huang Yanqing Y Li Long L Xi Yan Y Qi Qiuhui Q Li Wenjie W Zhang Bo B Hu Wei W Zhang Yanling Y Tian Xiangjun X Jiao Yongzhi Y Liang Xiaohu X Jin Jiao J Gao Lei L Zheng Weimou W Hao Bailin B Liu Siqi S Wang Wen W Yuan Longping L Cao Mengliang M McDermott Jason J Samudrala Ram R Wang Jian J Wong Gane Ka-Shu GK Yang Huanming H

PLoS biology 20050201 2

We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous pred ...[more]

PMID: 15685292

Similar Datasets

Project description:Nine different regions totaling 9.7 Mb of the 4.02 Gb Aegilops tauschii genome were sequenced using the Sanger sequencing technology and compared with orthologous Brachypodium distachyon, Oryza sativa (rice), and Sorghum bicolor (sorghum) genomic sequences. The ancestral gene content in these regions was inferred and used to estimate gene deletion and gene duplication rates along each branch of the phylogenetic tree relating the four species. The total gene number in the extant Ae. tauschii genome was estimated to be 36,371. The gene deletion and gene duplication rates and total gene numbers in the four genomes were used to estimate the total gene number in each node of the phylogenetic tree. The common ancestor of the Brachypodieae and Triticeae lineages was estimated to have had 28,558 genes, and the common ancestor of the Panicoideae, Ehrhartoideae, and Pooideae subfamilies was estimated to have had 27,152 or 28,350 genes, depending on the ancestral gene scenario. Relative to the Brachypodieae and Triticeae common ancestor, the gene number was reduced in B. distachyon by 3,026 genes and increased in Ae. tauschii by 7,813 genes. The sum of gene deletion and gene duplication rates, which reflects the rate of gene synteny loss, was correlated with the rate of structural chromosome rearrangements and was highest in the Ae. tauschii lineage and lowest in the rice lineage. The high rate of gene space evolution in the Ae. tauschii lineage accounts for the fact that, contrary to the expectations, the level of synteny between the phylogenetically more related Ae. tauschii and B. distachyon genomes is similar to the level of synteny between the Ae. tauschii genome and the genomes of the less related rice and sorghum. The ratio of gene duplication to gene deletion rates in these four grass species closely parallels both the total number of genes in a species and the overall genome size. Because the overall genome size is to a large extent a function of the repeated sequence content in a genome, we suggest that the amount and activity of repeated sequences are important factors determining the number of genes in a genome.

Dataset Information

The Genomes of Oryza sativa: a history of duplications.

Publications

The Genomes of Oryza sativa: a history of duplications.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets