Unknown

Dataset Information

0

A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.


ABSTRACT: Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers flexible functionality to close gaps in scaffolds, correct base errors in the consensus sequence and exploit reference genomes (if available) in order to improve scaffolding and generating annotations. The protocol is most accessible for bacterial and small eukaryotic genomes (up to 300 Mb), such as pathogenic bacteria, malaria and parasitic worms. Applying PAGIT to an E. coli assembly takes ?24 h: it doubles the average contig size and annotates over 4,300 gene models.

SUBMITTER: Swain MT 

PROVIDER: S-EPMC3648784 | biostudies-literature | 2012 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs.

Swain Martin T MT   Tsai Isheng J IJ   Assefa Samual A SA   Newbold Chris C   Berriman Matthew M   Otto Thomas D TD  

Nature protocols 20120607 7


Genome projects now produce draft assemblies within weeks owing to advanced high-throughput sequencing technologies. For milestone projects such as Escherichia coli or Homo sapiens, teams of scientists were employed to manually curate and finish these genomes to a high standard. Nowadays, this is not feasible for most projects, and the quality of genomes is generally of a much lower standard. This protocol describes software (PAGIT) that is used to improve the quality of draft genomes. It offers  ...[more]

Similar Datasets

| S-EPMC5381364 | biostudies-literature
| S-EPMC5084376 | biostudies-literature
| S-EPMC5925771 | biostudies-literature
| S-EPMC2723005 | biostudies-literature
| S-EPMC7488116 | biostudies-literature
| S-EPMC5741766 | biostudies-literature
| S-EPMC7703759 | biostudies-literature
| S-EPMC7483855 | biostudies-literature
| S-EPMC6582343 | biostudies-literature
| S-EPMC7066127 | biostudies-literature