Dataset Information

Identifying micro-inversions using high-throughput sequencing reads.

ABSTRACT: The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generation sequencing reads.The algorithm of MID is designed based on a dynamic programming path-finding approach. What makes MID different from other variant detection tools is that MID can handle small MIs and multiple breakpoints within an unmapped read. Moreover, MID improves reliability in low coverage data by integrating multiple samples. Our evaluation demonstrated that MID outperforms Gustaf, which can currently detect inversions from 30 bp to 500 bp.To our knowledge, MID is the first method that can efficiently and reliably identify MIs from unmapped short next-generation sequencing reads. MID is reliable on low coverage data, which is suitable for large-scale projects such as the 1000 Genomes Project (1KGP). MID identified previously unknown MIs from the 1KGP that overlap with genes and regulatory elements in the human genome. We also identified MIs in cancer cell lines from Cancer Cell Line Encyclopedia (CCLE). Therefore our tool is expected to be useful to improve the study of MIs as a type of genetic variant in the human genome. The source code can be downloaded from: http://cqb.pku.edu.cn/ZhuLab/MID .

SUBMITTER: He F

PROVIDER: S-EPMC4895285 | biostudies-literature | 2016 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Identifying micro-inversions using high-throughput sequencing reads.

He Feifei F Li Yang Y Tang Yu-Hang YH Ma Jian J Zhu Huaiqiu H

BMC genomics 20160111

<h4>Background</h4>The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generatio ...[more]

PMID: 26818118

Dataset Information

Identifying micro-inversions using high-throughput sequencing reads.

Publications

Identifying micro-inversions using high-throughput sequencing reads.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Fulcrum: condensing redundant reads from high-throughput sequencing studies.
| S-EPMC3348557 | biostudies-literature

HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment.
| S-EPMC4948903 | biostudies-literature

Identifying and Tracking Low-Frequency Virus-Specific TCR Clonotypes Using High-Throughput Sequencing.
| S-EPMC7770954 | biostudies-literature

BLESS: bloom filter-based error correction solution for high-throughput sequencing reads.
| S-EPMC6365934 | biostudies-literature

Centroid based clustering of high throughput sequencing reads based on n-mer counts.
| S-EPMC3848435 | biostudies-literature

Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies.
| S-EPMC3119603 | biostudies-literature

CASPER: context-aware scheme for paired-end reads from high-throughput amplicon sequencing.
| S-EPMC4168710 | biostudies-literature

A novel method for identifying polymorphic transposable elements via scanning of high-throughput short reads.
| S-EPMC4909310 | biostudies-literature

Assessing the responses of <i>Sphagnum</i> micro-eukaryotes to climate changes using high throughput sequencing.
| S-EPMC7505061 | biostudies-literature

RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA.
| S-EPMC4117746 | biostudies-literature