Dataset Information

Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data.

ABSTRACT: Large-scale initiatives aiming to recover the complete sequence of thousands of human genomes are currently being undertaken worldwide, concurring to the generation of a comprehensive catalog of human genetic variation. The ultimate and most ambitious goal of human population scale genomics is the characterization of the so-called human "variome," through the identification of causal mutations or haplotypes. Several research institutions worldwide currently use genotyping assays based on Next-Generation Sequencing (NGS) for diagnostics and clinical screenings, and the widespread application of such technologies promises major revolutions in medical science. Bioinformatic analysis of human resequencing data is one of the main factors limiting the effectiveness and general applicability of NGS for clinical studies. The requirement for multiple tools, to be combined in dedicated protocols in order to accommodate different types of data (gene panels, exomes, or whole genomes) and the high variability of the data makes difficult the establishment of a ultimate strategy of general use. While there already exist several studies comparing sensitivity and accuracy of bioinformatic pipelines for the identification of single nucleotide variants from resequencing data, little is known about the impact of quality assessment and reads pre-processing strategies. In this work we discuss major strengths and limitations of the various genome resequencing protocols are currently used in molecular diagnostics and for the discovery of novel disease-causing mutations. By taking advantage of publicly available data we devise and suggest a series of best practices for the pre-processing of the data that consistently improve the outcome of genotyping with minimal impacts on computational costs.

SUBMITTER: Chiara M

PROVIDER: S-EPMC5500642 | biostudies-literature | 2017

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data.

Chiara Matteo M Pavesi Giulio G

Frontiers in genetics 20170707

Large-scale initiatives aiming to recover the complete sequence of thousands of human genomes are currently being undertaken worldwide, concurring to the generation of a comprehensive catalog of human genetic variation. The ultimate and most ambitious goal of human population scale genomics is the characterization of the so-called human "variome," through the identification of causal mutations or haplotypes. Several research institutions worldwide currently use genotyping assays based on Next-Ge ...[more]

PMID: 28736571

Dataset Information

Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data.

Publications

Evaluation of Quality Assessment Protocols for High Throughput Genome Resequencing Data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

High-throughput genotyping by whole-genome resequencing.
| S-EPMC2694477 | biostudies-literature

Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping.
| S-EPMC3248099 | biostudies-literature

Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles.
| S-EPMC3592458 | biostudies-other

High-throughput, high-accuracy array-based resequencing.
| S-EPMC2672536 | biostudies-literature

A probabilistic approach for SNP discovery in high-throughput human resequencing data.
| S-EPMC2752119 | biostudies-literature

Efficient high-throughput resequencing of genomic DNA.
| S-EPMC430165 | biostudies-literature

Quality Control of Quantitative High Throughput Screening Data.
| S-EPMC6520559 | biostudies-literature

High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE.
| S-EPMC5860548 | biostudies-literature

SVM²: an improved paired-end-based tool for the detection of small genomic structural variations using high-throughput single-genome resequencing data.
| S-EPMC3467043 | biostudies-literature

Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and genome analyzer systems.
| S-EPMC3334598 | biostudies-literature