Unknown

Dataset Information

0

Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm.


ABSTRACT: We present a new approach to automatic training of a eukaryotic ab initio gene finding algorithm. With the advent of Next-Generation Sequencing, automatic training has become paramount, allowing genome annotation pipelines to keep pace with the speed of genome sequencing. Earlier we developed GeneMark-ES, currently the only gene finding algorithm for eukaryotic genomes that performs automatic training in unsupervised ab initio mode. The new algorithm, GeneMark-ET augments GeneMark-ES with a novel method that integrates RNA-Seq read alignments into the self-training procedure. Use of 'assembled' RNA-Seq transcripts is far from trivial; significant error rate of assembly was revealed in recent assessments. We demonstrated in computational experiments that the proposed method of incorporation of 'unassembled' RNA-Seq reads improves the accuracy of gene prediction; particularly, for the 1.3 GB genome of Aedes aegypti the mean value of prediction Sensitivity and Specificity at the gene level increased over GeneMark-ES by 24.5%. In the current surge of genomic data when the need for accurate sequence annotation is higher than ever, GeneMark-ET will be a valuable addition to the narrow arsenal of automatic gene prediction tools.

SUBMITTER: Lomsadze A 

PROVIDER: S-EPMC4150757 | biostudies-literature | 2014 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm.

Lomsadze Alexandre A   Burns Paul D PD   Borodovsky Mark M  

Nucleic acids research 20140702 15


We present a new approach to automatic training of a eukaryotic ab initio gene finding algorithm. With the advent of Next-Generation Sequencing, automatic training has become paramount, allowing genome annotation pipelines to keep pace with the speed of genome sequencing. Earlier we developed GeneMark-ES, currently the only gene finding algorithm for eukaryotic genomes that performs automatic training in unsupervised ab initio mode. The new algorithm, GeneMark-ET augments GeneMark-ES with a nove  ...[more]

Similar Datasets

| S-EPMC5766199 | biostudies-literature
| S-EPMC3467739 | biostudies-literature
| S-EPMC4064318 | biostudies-literature
| S-EPMC2952873 | biostudies-other
| S-EPMC5550947 | biostudies-other
| S-EPMC4474535 | biostudies-literature
| S-EPMC6701478 | biostudies-literature
| S-EPMC4111418 | biostudies-literature
| S-EPMC8088329 | biostudies-literature
| S-EPMC3480842 | biostudies-other