Dataset Information

CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes.

ABSTRACT: BACKGROUND:Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce. RESULT:Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pipeline that fills this gap. CSA integrates information from scaffolded assemblies (e.g., Hi-C or 10X Genomics) or even from diverged reference genomes into the assembly process. As CSA performs automated assembly of chromosome-sized scaffolds, we benchmark its performance against state-of-the-art reference genomes, i.e., conventionally built in a laborious fashion using multiple separate assembly tools and manual curation. CSA increases the contig lengths using scaffolding, local re-assembly, and gap closing. On certain datasets, initial contig N50 may be increased up to 4.5-fold. For smaller vertebrate genomes, chromosome-scale assemblies can be achieved within 12 h using low-cost, high-end desktop computers. Mammalian genomes can be processed within 16 h on compute-servers. Using diverged reference genomes for fish, birds, and mammals, we demonstrate that CSA calculates chromosome-scale assemblies from long-read data and genome comparisons alone. Even contig-level draft assemblies of diverged genomes are helpful for reconstructing chromosome-scale sequences. CSA is also capable of assembling ultra-long reads. CONCLUSIONS:CSA can speed up and simplify chromosome-level assembly and significantly lower costs of large-scale family-level vertebrate genome projects.

SUBMITTER: Kuhl H

PROVIDER: S-EPMC7247394 | biostudies-literature | 2020 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes.

Kuhl Heiner H Li Ling L Wuertz Sven S Stöck Matthias M Liang Xu-Fang XF Klopp Christophe C

GigaScience 20200501 5

<h4>Background</h4>Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce.<h4>Result</h4>Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pipeline that fills this gap. CSA integrates information from scaffolded assemblies (e.g., Hi-C or 10X Genomics) or even from diverged reference genomes into the assembly process. As CSA per ...[more]

PMID: 32449778

Dataset Information

CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes.

Publications

CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A technical guide to TRITEX, a computational pipeline for chromosome-scale sequence assembly of plant genomes.
| S-EPMC9719158 | biostudies-literature

Chromosome-scale, haplotype-resolved assembly of human genomes.
| S-EPMC7954703 | biostudies-literature

Gamete binning: chromosome-level and haplotype-resolved genome assembly enabled by high-throughput single-cell sequencing of gamete genomes.
| S-EPMC7771071 | biostudies-literature

TRITEX: chromosome-scale sequence assembly of Triticeae genomes with open-source tools.
| S-EPMC6918601 | biostudies-literature

Chromosome-scale genome sequencing, assembly and annotation of six genomes from subfamily Leishmaniinae.
| S-EPMC8421402 | biostudies-literature

High-throughput hyperdimensional vertebrate phenotyping.
| S-EPMC3573763 | biostudies-literature

MICRA: an automatic pipeline for fast characterization of microbial genomes from high-throughput sequencing data.
| S-EPMC5738152 | biostudies-literature

Sequencing and Chromosome-Scale Assembly of Plant Genomes, Brassica rapa as a Use Case.
| S-EPMC8389630 | biostudies-literature

Redundans: an assembly pipeline for highly heterozygous genomes.
| S-EPMC4937319 | biostudies-literature

High-throughput in vivo vertebrate screening.
| S-EPMC2941625 | biostudies-literature