Dataset Information

NCBI prokaryotic genome annotation pipeline.

ABSTRACT: Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.

SUBMITTER: Tatusova T

PROVIDER: S-EPMC5001611 | biostudies-literature | 2016 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

NCBI prokaryotic genome annotation pipeline.

Tatusova Tatiana T DiCuccio Michael M Badretdin Azat A Chetvernin Vyacheslav V Nawrocki Eric P EP Zaslavsky Leonid L Lomsadze Alexandre A Pruitt Kim D KD Borodovsky Mark M Ostell James J

Nucleic acids research 20160624 14

Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration wi ...[more]

PMID: 27342282

Dataset Information

NCBI prokaryotic genome annotation pipeline.

Publications

NCBI prokaryotic genome annotation pipeline.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication.
| S-EPMC5860143 | biostudies-literature

MyPro: A seamless pipeline for automated prokaryotic genome assembly and annotation.
| S-EPMC4828917 | biostudies-literature

RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation.
| S-EPMC7779008 | biostudies-literature

VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank.
| S-EPMC6343335 | biostudies-literature

Collection and curation of prokaryotic genome assemblies from type strains at NCBI.
| S-EPMC10228379 | biostudies-literature

Genome Annotation Generator: a simple tool for generating and correcting WGS annotation tables for NCBI submission.
| S-EPMC5887294 | biostudies-literature

GAAP: A Genome Assembly + Annotation Pipeline.
| S-EPMC6617929 | biostudies-literature

RefSeq: an update on prokaryotic genome annotation and curation.
| S-EPMC5753331 | biostudies-literature

GAAP: Genome-organization-framework-Assisted Assembly Pipeline for prokaryotic genomes.
| S-EPMC5310280 | biostudies-literature

Prokaryotic phylogenies inferred from whole-genome sequence and annotation data.
| S-EPMC3773407 | biostudies-literature