Unknown

Dataset Information

0

MindTheGap: integrated detection and assembly of short and long insertions.


ABSTRACT: MOTIVATION: Insertions play an important role in genome evolution. However, such variants are difficult to detect from short-read sequencing data, especially when they exceed the paired-end insert size. Many approaches have been proposed to call short insertion variants based on paired-end mapping. However, there remains a lack of practical methods to detect and assemble long variants. RESULTS: We propose here an original method, called MindTheGap, for the integrated detection and assembly of insertion variants from re-sequencing data. Importantly, it is designed to call insertions of any size, whether they are novel or duplicated, homozygous or heterozygous in the donor genome. MindTheGap uses an efficient k-mer-based method to detect insertion sites in a reference genome, and subsequently assemble them from the donor reads. MindTheGap showed high recall and precision on simulated datasets of various genome complexities. When applied to real Caenorhabditis elegans and human NA12878 datasets, MindTheGap detected and correctly assembled insertions >1 kb, using at most 14 GB of memory.

SUBMITTER: Rizk G 

PROVIDER: S-EPMC4253827 | biostudies-literature | 2014 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

MindTheGap: integrated detection and assembly of short and long insertions.

Rizk Guillaume G   Gouin Anaïs A   Chikhi Rayan R   Lemaitre Claire C  

Bioinformatics (Oxford, England) 20140814 24


<h4>Motivation</h4>Insertions play an important role in genome evolution. However, such variants are difficult to detect from short-read sequencing data, especially when they exceed the paired-end insert size. Many approaches have been proposed to call short insertion variants based on paired-end mapping. However, there remains a lack of practical methods to detect and assemble long variants.<h4>Results</h4>We propose here an original method, called MindTheGap, for the integrated detection and a  ...[more]

Similar Datasets

| S-EPMC4545859 | biostudies-literature
2023-10-14 | GSE215355 | GEO
2023-10-14 | GSE215357 | GEO
2022-12-31 | GSE166709 | GEO
2011-02-28 | GSE27080 | GEO
2011-02-28 | E-GEOD-27080 | biostudies-arrayexpress
| S-EPMC3084767 | biostudies-literature
| S-EPMC9750119 | biostudies-literature
| S-EPMC4090127 | biostudies-literature
| S-EPMC4907386 | biostudies-literature