Unknown

Dataset Information

0

Efficient plant gene identification based on interspecies mapping of full-length cDNAs.


ABSTRACT: We present an annotation pipeline that accurately predicts exon-intron structures and protein-coding sequences (CDSs) on the basis of full-length cDNAs (FLcDNAs). This annotation pipeline was used to identify genes in 10 plant genomes. In particular, we show that interspecies mapping of FLcDNAs to genomes is of great value in fully utilizing FLcDNA resources whose availability is limited to several species. Because low sequence conservation at 5'- and 3'-ends of FLcDNAs between different species tends to result in truncated CDSs, we developed an improved algorithm to identify complete CDSs by the extension of both ends of truncated CDSs. Interspecies mapping of 71 801 monocot FLcDNAs to the Oryza sativa genome led to the detection of 22 142 protein-coding regions. Moreover, in comparing two mapping programs and three ab initio prediction programs, we found that our pipeline was more capable of identifying complete CDSs. As demonstrated by monocot interspecies mapping, in which nucleotide identity between FLcDNAs and the genome was ?80%, the resultant inferred CDSs were sufficiently accurate. Finally, we applied both inter- and intraspecies mapping to 10 monocot and dicot genomes and identified genes in 210 551 loci. Interspecies mapping of FLcDNAs is expected to effectively predict genes and CDSs in newly sequenced genomes.

SUBMITTER: Amano N 

PROVIDER: S-EPMC2955710 | biostudies-literature | 2010 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Efficient plant gene identification based on interspecies mapping of full-length cDNAs.

Amano Naoki N   Tanaka Tsuyoshi T   Numa Hisataka H   Sakai Hiroaki H   Itoh Takeshi T  

DNA research : an international journal for rapid publication of reports on genes and genomes 20100728 5


We present an annotation pipeline that accurately predicts exon-intron structures and protein-coding sequences (CDSs) on the basis of full-length cDNAs (FLcDNAs). This annotation pipeline was used to identify genes in 10 plant genomes. In particular, we show that interspecies mapping of FLcDNAs to genomes is of great value in fully utilizing FLcDNA resources whose availability is limited to several species. Because low sequence conservation at 5'- and 3'-ends of FLcDNAs between different species  ...[more]

Similar Datasets

| S-EPMC2774520 | biostudies-literature
| S-EPMC186637 | biostudies-literature
| S-EPMC186662 | biostudies-literature
| S-EPMC549067 | biostudies-literature
| S-EPMC2780955 | biostudies-literature
| S-EPMC311163 | biostudies-literature
2022-04-21 | GSE190930 | GEO
| S-EPMC311073 | biostudies-literature
| S-EPMC4675717 | biostudies-literature
| S-EPMC99097 | biostudies-literature