Dataset Information

AGOUTI: improving genome assembly and annotation using transcriptome data.

ABSTRACT: Genomes sequenced using short-read, next-generation sequencing technologies can have many errors and may be fragmented into thousands of small contigs. These incomplete and fragmented assemblies lead to errors in gene identification, such that single genes spread across multiple contigs are annotated as separate gene models. Such biases can confound inferences about the number and identity of genes within species, as well as gene gain and loss between species.We present AGOUTI (Annotated Genome Optimization Using Transcriptome Information), a tool that uses RNA sequencing data to simultaneously combine contigs into scaffolds and fragmented gene models into single models. We show that AGOUTI improves both the contiguity of genome assemblies and the accuracy of gene annotation, providing updated versions of each as output. Running AGOUTI on both simulated and real datasets, we show that it is highly accurate and that it achieves greater accuracy and contiguity when compared with other existing methods.AGOUTI is a powerful and effective scaffolder and, unlike most scaffolders, is expected to be more effective in larger genomes because of the commensurate increase in intron length. AGOUTI is able to scaffold thousands of contigs while simultaneously reducing the number of gene models by hundreds or thousands. The software is available free of charge under the MIT license.

SUBMITTER: Zhang SV

PROVIDER: S-EPMC4952227 | biostudies-literature | 2016 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

AGOUTI: improving genome assembly and annotation using transcriptome data.

Zhang Simo V SV Zhuo Luting L Hahn Matthew W MW

GigaScience 20160719 1

<h4>Background</h4>Genomes sequenced using short-read, next-generation sequencing technologies can have many errors and may be fragmented into thousands of small contigs. These incomplete and fragmented assemblies lead to errors in gene identification, such that single genes spread across multiple contigs are annotated as separate gene models. Such biases can confound inferences about the number and identity of genes within species, as well as gene gain and loss between species.<h4>Results</h4>W ...[more]

PMID: 27435057

Dataset Information

AGOUTI: improving genome assembly and annotation using transcriptome data.

Publications

AGOUTI: improving genome assembly and annotation using transcriptome data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

ChopStitch: exon annotation and splice graph construction using transcriptome assembly and whole genome sequencing data.
| S-EPMC5946899 | biostudies-literature

Improving the ostrich genome assembly using optical mapping data.
| S-EPMC4427950 | biostudies-literature

Improving silkworm genome annotation using a proteogenomics approach
2019-07-01 | PXD009672 | Pride

Improving silkworm genome annotation using a proteogenomics approach
2019-07-01 | PXD009697 | Pride

Improving pan-genome annotation using whole genome multiple alignment.
| S-EPMC3142524 | biostudies-literature

Improving Human Genome Annotation Using a High-Stringency Proteogenomics Workflow
2016-05-27 | PXD002967 | Pride

Improving the Caenorhabditis elegans genome annotation using machine learning.
| S-EPMC1808025 | biostudies-literature

Improving eukaryotic genome annotation using single molecule mRNA sequencing.
| S-EPMC5833154 | biostudies-literature

Complete assembly of the Leishmania donovani (HU3 strain) genome and transcriptome annotation.
| S-EPMC6467909 | biostudies-literature

Improving the Annotation of Arabidopsis lyrata Using RNA-Seq Data.
| S-EPMC4575116 | biostudies-literature