Dataset Information

Ultra-fast genome comparison for large-scale genomic experiments.

ABSTRACT: In the last decade, a technological shift in the bioinformatics field has occurred: larger genomes can now be sequenced quickly and cost effectively, resulting in the computational need to efficiently compare large and abundant sequences. Furthermore, detecting conserved similarities across large collections of genomes remains a problem. The size of chromosomes, along with the substantial amount of noise and number of repeats found in DNA sequences (particularly in mammals and plants), leads to a scenario where executing and waiting for complete outputs is both time and resource consuming. Filtering steps, manual examination and annotation, very long execution times and a high demand for computational resources represent a few of the many difficulties faced in large genome comparisons. In this work, we provide a method designed for comparisons of considerable amounts of very long sequences that employs a heuristic algorithm capable of separating noise and repeats from conserved fragments in pairwise genomic comparisons. We provide software implementation that computes in linear time using one core as a minimum and a small, constant memory footprint. The method produces both a previsualization of the comparison and a collection of indices to drastically reduce computational complexity when performing exhaustive comparisons. Last, the method scores the comparison to automate classification of sequences and produces a list of detected synteny blocks to enable new evolutionary studies.

SUBMITTER: Perez-Wohlfeil E

PROVIDER: S-EPMC6635410 | biostudies-literature | 2019 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Ultra-fast genome comparison for large-scale genomic experiments.

Pérez-Wohlfeil Esteban E Diaz-Del-Pino Sergio S Trelles Oswaldo O

Scientific reports 20190716 1

In the last decade, a technological shift in the bioinformatics field has occurred: larger genomes can now be sequenced quickly and cost effectively, resulting in the computational need to efficiently compare large and abundant sequences. Furthermore, detecting conserved similarities across large collections of genomes remains a problem. The size of chromosomes, along with the substantial amount of noise and number of repeats found in DNA sequences (particularly in mammals and plants), leads to ...[more]

PMID: 31312019

Dataset Information

Ultra-fast genome comparison for large-scale genomic experiments.

Publications

Ultra-fast genome comparison for large-scale genomic experiments.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Fast principal component analysis of large-scale genome-wide data.
| S-EPMC3981753 | biostudies-literature

Robust meta-analysis for large-scale genomic experiments based on an empirical approach.
| S-EPMC8832678 | biostudies-literature

Fast machine-learning online optimization of ultra-cold-atom experiments.
| S-EPMC4867626 | biostudies-literature

EAMA: Empirically adjusted meta-analysis for large-scale simultaneous hypothesis testing in genomic experiments.
| S-EPMC5663489 | biostudies-literature

A pan-CRISPR analysis of mammalian cell specificity identifies ultra-compact sgRNA subsets for genome-scale experiments.
| S-EPMC8810922 | biostudies-literature

GenMap: ultra-fast computation of genome mappability.
| S-EPMC7320602 | biostudies-literature

FVGWAS: Fast voxelwise genome wide association analysis of large-scale imaging genetic data.
| S-EPMC4554832 | biostudies-literature

Large-scale genomic analysis of ovarian carcinomas.
| S-EPMC5527877 | biostudies-other

SpeedSeq: ultra-fast personal genome analysis and interpretation.
| S-EPMC4589466 | biostudies-literature

BPGA- an ultra-fast pan-genome analysis pipeline.
| S-EPMC4829868 | biostudies-literature