Unknown

Dataset Information

0

BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.


ABSTRACT:

Motivation

Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the prediction step. Complementary strengths of GeneMark-ET and AUGUSTUS provided motivation for designing a new combined tool for automatic gene prediction.

Results

We present BRAKER1, a pipeline for unsupervised RNA-Seq-based genome annotation that combines the advantages of GeneMark-ET and AUGUSTUS. As input, BRAKER1 requires a genome assembly file and a file in bam-format with spliced alignments of RNA-Seq reads to the genome. First, GeneMark-ET performs iterative training and generates initial gene structures. Second, AUGUSTUS uses predicted genes for training and then integrates RNA-Seq read information into final gene predictions. In our experiments, we observed that BRAKER1 was more accurate than MAKER2 when it is using RNA-Seq as sole source for training and prediction. BRAKER1 does not require pre-trained parameters or a separate expert-prepared training step.

Availability and implementation

BRAKER1 is available for download at http://bioinf.uni-greifswald.de/bioinf/braker/ and http://exon.gatech.edu/GeneMark/

Contact

katharina.hoff@uni-greifswald.de or borodovsky@gatech.edu

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Hoff KJ 

PROVIDER: S-EPMC6078167 | biostudies-literature | 2016 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS.

Hoff Katharina J KJ   Lange Simone S   Lomsadze Alexandre A   Borodovsky Mark M   Stanke Mario M  

Bioinformatics (Oxford, England) 20151111 5


<h4>Motivation</h4>Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene finder that usually requires supervised training and uses information from RNA-Seq reads in the pr  ...[more]

Similar Datasets

| S-EPMC11216308 | biostudies-literature
| S-EPMC10312602 | biostudies-literature
| S-EPMC7787252 | biostudies-literature
| S-EPMC9528981 | biostudies-literature
| S-EPMC10120627 | biostudies-literature
| S-EPMC7397036 | biostudies-literature
| S-EPMC5712117 | biostudies-literature
| S-EPMC3219749 | biostudies-literature
| S-EPMC6505119 | biostudies-literature
| S-EPMC5824900 | biostudies-literature