Dataset Information

GAAP: A Genome Assembly + Annotation Pipeline.

ABSTRACT: Genomic analysis begins with de novo assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these genes are determined. Recently, a wide range of powerful tools have been developed and published for whole-genome analysis, enabling even individual researchers in small laboratories to perform whole-genome analyses on their objects of interest. However, these analytical tools are generally complex and use diverse algorithms, parameter setting methods, and input formats; thus, it remains difficult for individual researchers to select, utilize, and combine these tools to obtain their final results. To resolve these issues, we have developed a genome analysis pipeline (GAAP) for semiautomated, iterative, and high-throughput analysis of whole-genome data. This pipeline is designed to perform read correction, de novo genome (transcriptome) assembly, gene prediction, and functional annotation using a range of proven tools and databases. We aim to assist non-IT researchers by describing each stage of analysis in detail and discussing current approaches. We also provide practical advice on how to access and use the bioinformatics tools and databases and how to implement the provided suggestions. Whole-genome analysis of Toxocara canis is used as case study to show intermediate results at each stage, demonstrating the practicality of the proposed method.

SUBMITTER: Kong J

PROVIDER: S-EPMC6617929 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

GAAP: A Genome Assembly + Annotation Pipeline.

Kong Jinhwa J Huh Sun S Won Jung-Im JI Yoon Jeehee J Kim Baeksop B Kim Kiyong K

BioMed research international 20190626

Genomic analysis begins with <i>de novo</i> assembly of short-read fragments in order to reconstruct full-length base sequences without exploiting a reference genome sequence. Then, in the annotation step, gene locations are identified within the base sequences, and the structures and functions of these genes are determined. Recently, a wide range of powerful tools have been developed and published for whole-genome analysis, enabling even individual researchers in small laboratories to perform w ...[more]

PMID: 31346518

Dataset Information

GAAP: A Genome Assembly + Annotation Pipeline.

Publications

GAAP: A Genome Assembly + Annotation Pipeline.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

LGAAP: <i>Leishmaniinae</i> Genome Assembly and Annotation Pipeline.
| S-EPMC8297458 | biostudies-literature

MyPro: A seamless pipeline for automated prokaryotic genome assembly and annotation.
| S-EPMC4828917 | biostudies-literature

ToxCodAn-Genome: an automated pipeline for toxin-gene annotation in genome assembly of venomous lineages.
| S-EPMC10797961 | biostudies-literature

PipeCoV: a pipeline for SARS-CoV-2 genome assembly, annotation and variant identification.
| S-EPMC9013232 | biostudies-literature

A Standardized Pipeline for Assembly and Annotation of African Swine Fever Virus Genome.
| S-EPMC11359534 | biostudies-literature

NCBI prokaryotic genome annotation pipeline.
| S-EPMC5001611 | biostudies-literature

MEGAnnotator2: a pipeline for the assembly and annotation of microbial genomes.
| S-EPMC10696586 | biostudies-literature

DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication.
| S-EPMC5860143 | biostudies-literature

VirusTAP: Viral Genome-Targeted Assembly Pipeline.
| S-EPMC4735447 | biostudies-literature

MetaSanity: an integrated microbial genome evaluation and annotation pipeline.
| S-EPMC7520038 | biostudies-literature