Dataset Information

The impact of post-alignment processing procedures on whole-exome sequencing data.

ABSTRACT: The use of post-alignment procedures has been suggested to prevent the identification of false-positives in massive DNA sequencing data. Insertions and deletions are most likely to be misinterpreted by variant calling algorithms. Using known genetic variants as references for post-processing pipelines can minimize mismatches. They allow reads to be correctly realigned and recalibrated, resulting in more parsimonious variant calling. In this work, we aim to investigate the impact of using different sets of common variants as references to facilitate variant calling from whole-exome sequencing data. We selected reference variants from common insertions and deletions available within the 1K Genomes project data and from databases from the Latin American Database of Genetic Variation (LatinGen). We used the Genome Analysis Toolkit to perform post-processing procedures like local realignment, quality recalibration procedures, and variant calling in whole exome samples. We identified an increased number of variants from the call set for all groups when no post-processing procedure was performed. We found that there was a higher concordance rate between variants called using 1K Genomes and LatinGen. Therefore, we believe that the increased number of rare variants identified in the analysis without realignment or quality recalibration indicated that they were likely false-positives.

SUBMITTER: Borges MG

PROVIDER: S-EPMC7783507 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The impact of post-alignment processing procedures on whole-exome sequencing data.

Borges Murilo Guimarães MG Moraes Helena Tadiello de HT Rocha Cristiane de Souza CS Lopes-Cendes Iscia I

Genetics and molecular biology 20201113 4

The use of post-alignment procedures has been suggested to prevent the identification of false-positives in massive DNA sequencing data. Insertions and deletions are most likely to be misinterpreted by variant calling algorithms. Using known genetic variants as references for post-processing pipelines can minimize mismatches. They allow reads to be correctly realigned and recalibrated, resulting in more parsimonious variant calling. In this work, we aim to investigate the impact of using differe ...[more]

PMID: 33306778

Dataset Information

The impact of post-alignment processing procedures on whole-exome sequencing data.

Publications

The impact of post-alignment processing procedures on whole-exome sequencing data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

PaCBAM: fast and scalable processing of whole exome and targeted sequencing data.
| S-EPMC6933905 | biostudies-literature

Clinical whole-exome sequencing results impact medical management.
| S-EPMC6305629 | biostudies-literature

EthSEQ: ethnicity annotation from whole exome sequencing data.
| S-EPMC5818140 | biostudies-literature

Quantifying tumor heterogeneity in whole-genome and whole-exome sequencing data.
| S-EPMC4253833 | biostudies-literature

Whole-exome sequencing and its impact in hereditary hearing loss.
| S-EPMC5503681 | biostudies-literature

Can whole-exome sequencing data be used for linkage analysis?
| S-EPMC4929867 | biostudies-literature

EXCAVATOR: detecting copy number variants from whole-exome sequencing data.
| S-EPMC4053953 | biostudies-literature

Whole-exome sequencing identified mutational profiles of urothelial carcinoma post kidney transplantation.
| S-EPMC9301867 | biostudies-literature

Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants.
| S-EPMC4418901 | biostudies-literature

Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data.
| S-EPMC5549930 | biostudies-literature