Dataset Information

The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes.

ABSTRACT:

Background

Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls.

Results

This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%.

Conclusions

In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans.

SUBMITTER: Challis D

PROVIDER: S-EPMC4352271 | biostudies-literature | 2015 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes.

Challis Danny D Antunes Lilian L Garrison Erik E Banks Eric E Evani Uday S US Muzny Donna D Poplin Ryan R Gibbs Richard A RA Marth Gabor G Yu Fuli F

BMC genomics 20150228

<h4>Background</h4>Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 ...[more]

PMID: 25765891

Dataset Information

The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes.

Background

Results

Conclusions

Publications

The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Polygenic architecture of rare coding variation across 394,783 exomes.
| S-EPMC10614218 | biostudies-literature

Patterns of coding variation in the complete exomes of three Neandertals.
| S-EPMC4020111 | biostudies-literature

SOAPindel: efficient identification of indels from short paired reads.
| S-EPMC3530679 | biostudies-literature

RNAIndel: discovering somatic coding indels from tumor RNA-Seq data.
| S-EPMC7523641 | biostudies-literature

Fast neutron mutagenesis in soybean enriches for small indels and creates frameshift mutations.
| S-EPMC9335934 | biostudies-literature

Identification and analysis of short indels inducing exon extension/shrinkage events.
| S-EPMC11452298 | biostudies-literature

Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data.
| S-EPMC5549930 | biostudies-literature

Evolution and functional impact of rare coding variation from deep sequencing of human exomes.
| S-EPMC3708544 | biostudies-literature

Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants.
| S-EPMC3676746 | biostudies-literature

Characterization of Arabian Peninsula whole exomes: Contributing to the catalogue of human diversity.
| S-EPMC9619305 | biostudies-literature