Unknown

Dataset Information

0

GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins.


ABSTRACT: We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient ab initio gene finding, GeneMark-ES, with parameters trained in iterative unsupervised mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source of external information, a protein database, readily available prior to the start of a sequencing project. A new specialized pipeline, ProtHint, initiates massive protein mapping to genome and extracts hints to splice sites and translation start and stop sites of potential genes. GeneMark-EP uses the hints to improve estimation of model parameters as well as to adjust coordinates of predicted genes if they disagree with the most reliable hints (the -EP+ mode). Tests of GeneMark-EP and -EP+ demonstrated improvements in gene prediction accuracy in comparison with GeneMark-ES, while the GeneMark-EP+ showed higher accuracy than GeneMark-ET. We have observed that the most pronounced improvements in gene prediction accuracy happened in large eukaryotic genomes.

SUBMITTER: Bruna T 

PROVIDER: S-EPMC7222226 | biostudies-literature | 2020 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins.

Brůna Tomáš T   Lomsadze Alexandre A   Borodovsky Mark M  

NAR genomics and bioinformatics 20200513 2


We have made several steps toward creating a fast and accurate algorithm for gene prediction in eukaryotic genomes. First, we introduced an automated method for efficient <i>ab initio</i> gene finding, GeneMark-ES, with parameters trained in iterative <i>unsupervised</i> mode. Next, in GeneMark-ET we proposed a method of integration of unsupervised training with information on intron positions revealed by mapping short RNA reads. Now we describe GeneMark-EP, a tool that utilizes another source o  ...[more]

Similar Datasets

| S-EPMC7787252 | biostudies-literature
| S-EPMC9882169 | biostudies-literature
| S-EPMC1298918 | biostudies-literature
| S-EPMC8153819 | biostudies-literature
| S-EPMC11216313 | biostudies-literature
| S-EPMC5333190 | biostudies-literature
| S-EPMC3089888 | biostudies-literature
| S-EPMC5305209 | biostudies-literature
| S-EPMC4556409 | biostudies-literature
| S-EPMC9063120 | biostudies-literature