Unknown

Dataset Information

0

A fast, reliable and easy method to detect within-species DNA contamination.


ABSTRACT:

Background and aim

Next generation sequencing (ngs) is becoming the standard for clinical diagnosis. Different steps of NGS, such as DNA extraction, fragmentation, library preparation and amplification, require handling of samples, making the process susceptible to contamination. In diagnostic environments, sample contamination with DNA from the same species can lead to errors in diagnosis. Here we propose a simple method to detect within-sample contamination based on analysis of the heterozygous single nucleotide polymorphisms allele ratio (AR).

Methods

A dataset of 38000 heterozygous snps was used to estimate the ar distribution. The parameters of the reference distribution were then used to estimate the contamination probability of a sample. Validation was performed using 12 samples contaminated to different levels.

Results

Results show that the method easily detects contamination of 20% or more. The method has a limit of detection of about 10%, threshold below which the number of false positives increases significantly.

Conclusions

The method can be applied to any type of ngs analysis and is useful for quality control. Being fast and easy to implement makes it ideal for inclusion in NGS pipelines to improve quality control of data and make results more robust.

SUBMITTER: Dallavilla T 

PROVIDER: S-EPMC8023143 | biostudies-literature |

REPOSITORIES: biostudies-literature

Similar Datasets

| S-EPMC6132906 | biostudies-literature
| S-EPMC4149544 | biostudies-literature
| S-EPMC6196395 | biostudies-literature
| S-EPMC4678847 | biostudies-literature
| S-EPMC4886510 | biostudies-literature
| S-EPMC3759424 | biostudies-literature
| S-EPMC4315405 | biostudies-literature
| S-EPMC6765363 | biostudies-literature
| S-EPMC4544636 | biostudies-literature
| S-EPMC6918607 | biostudies-literature