Dataset Information

Discovery of large genomic inversions using long range information.

ABSTRACT: Although many algorithms are now available that aim to characterize different classes of structural variation, discovery of balanced rearrangements such as inversions remains an open problem. This is mainly due to the fact that breakpoints of such events typically lie within segmental duplications or common repeats, which reduces the mappability of short reads. The algorithms developed within the 1000 Genomes Project to identify inversions are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies.Here we propose a novel algorithm, VALOR, to discover large inversions using new sequencing methods that provide long range information such as 10X Genomics linked-read sequencing, pooled clone sequencing, or other similar technologies that we commonly refer to as long range sequencing. We demonstrate the utility of VALOR using both pooled clone sequencing and 10X Genomics linked-read sequencing generated from the genome of an individual from the HapMap project (NA12878). We also provide a comprehensive comparison of VALOR against several state-of-the-art structural variation discovery algorithms that use whole genome shotgun sequencing data.In this paper, we show that VALOR is able to accurately discover all previously identified and experimentally validated large inversions in the same genome with a low false discovery rate. Using VALOR, we also predicted a novel inversion, which we validated using fluorescent in situ hybridization. VALOR is available at https://github.com/BilkentCompGen/VALOR.

SUBMITTER: Eslami Rasekh M

PROVIDER: S-EPMC5223412 | biostudies-literature | 2017 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Discovery of large genomic inversions using long range information.

Eslami Rasekh Marzieh M Chiatante Giorgia G Miroballo Mattia M Tang Joyce J Ventura Mario M Amemiya Chris T CT Eichler Evan E EE Antonacci Francesca F Alkan Can C

BMC genomics 20170110 1

<h4>Background</h4>Although many algorithms are now available that aim to characterize different classes of structural variation, discovery of balanced rearrangements such as inversions remains an open problem. This is mainly due to the fact that breakpoints of such events typically lie within segmental duplications or common repeats, which reduces the mappability of short reads. The algorithms developed within the 1000 Genomes Project to identify inversions are limited to relatively short inver ...[more]

PMID: 28073353

Dataset Information

Discovery of large genomic inversions using long range information.

Publications

Discovery of large genomic inversions using long range information.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Scaffolding of long read assemblies using long range contact information.
| S-EPMC5508778 | biostudies-literature

Identity inference of genomic data using long-range familial searches.
| S-EPMC7549546 | biostudies-literature

Genomic Surveillance of SARS-CoV-2 Using Long-Range PCR Primers.
| S-EPMC10369864 | biostudies-literature

Trans-oceanic genomic divergence of Atlantic cod ecotypes is associated with large inversions.
| S-EPMC5677996 | biostudies-literature

Integrating long-range connectivity information into de Bruijn graphs.
| S-EPMC6061703 | biostudies-literature

npInv: accurate detection and genotyping of inversions using long read sub-alignment.
| S-EPMC6044046 | biostudies-literature

Long-range movement of large mechanically interlocked DNA nanostructures.
| S-EPMC4980458 | biostudies-literature

Integrating germline and somatic variation information using genomic data for the discovery of biomarkers in prostate cancer.
| S-EPMC6417124 | biostudies-literature

Long-Range RNA Structural Information via a Paramagnetically Tagged Reporter Protein.
| S-EPMC6572783 | biostudies-literature

Large-scale discovery of protein interactions at residue resolution using co-evolution calculated from genomic sequences.
| S-EPMC7925567 | biostudies-literature