Unknown

Dataset Information

0

Finding genes in Schistosoma japonicum: annotating novel genomes with help of extrinsic evidence.


ABSTRACT: We have developed a novel method for estimating the parameters of hidden Markov models for gene finding in newly sequenced species. Our approach does not rely on curated training data sets, but instead uses extrinsic evidence (including paired-end ditags that have not been used in gene finding previously) and iterative training. This new method is particularly suitable for annotation of species with large evolutionary distance to the closest annotated species. We have used our approach to produce an initial annotation of more than 16,000 genes in the newly sequenced Schistosoma japonicum draft genome. We established the high quality of our predictions by comparison to full-length cDNAs (withdrawn from the extrinsic evidence) and to CEGMA core genes. We also evaluated the effectiveness of the new training procedure on Caenorhabditis elegans genome. ExonHunter and the newest parametric files for S. japonicum genome are available for download at www.bioinformatics.uwaterloo.ca/downloads/exonhunter.

SUBMITTER: Brejova B 

PROVIDER: S-EPMC2673418 | biostudies-literature | 2009 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Finding genes in Schistosoma japonicum: annotating novel genomes with help of extrinsic evidence.

Brejová Brona B   Vinar Tomás T   Chen Yangyi Y   Wang Shengyue S   Zhao Guoping G   Brown Daniel G DG   Li Ming M   Zhou Yan Y  

Nucleic acids research 20090305 7


We have developed a novel method for estimating the parameters of hidden Markov models for gene finding in newly sequenced species. Our approach does not rely on curated training data sets, but instead uses extrinsic evidence (including paired-end ditags that have not been used in gene finding previously) and iterative training. This new method is particularly suitable for annotation of species with large evolutionary distance to the closest annotated species. We have used our approach to produc  ...[more]

Similar Datasets

| S-EPMC2770079 | biostudies-literature
2012-03-05 | GSE3641 | GEO
| S-EPMC2603315 | biostudies-literature
| S-EPMC4349608 | biostudies-literature
| S-EPMC421630 | biostudies-literature
| S-EPMC4811333 | biostudies-literature
| S-EPMC3203850 | biostudies-literature
| S-EPMC3065681 | biostudies-literature
2012-03-05 | E-GEOD-3641 | biostudies-arrayexpress
| S-EPMC5316221 | biostudies-literature