Dataset Information

From trash to treasure: detecting unexpected contamination in unmapped NGS data.

ABSTRACT: BACKGROUND:Next Generation Sequencing (NGS) experiments produce millions of short sequences that, mapped to a reference genome, provide biological insights at genomic, transcriptomic and epigenomic level. Typically the amount of reads that correctly maps to the reference genome ranges between 70% and 90%, leaving in some cases a consistent fraction of unmapped sequences. This 'misalignment' can be ascribed to low quality bases or sequence differences between the sample reads and the reference genome. Investigating the source of the unmapped reads is definitely important to better assess the quality of the whole experiment and to check for possible downstream or upstream 'contamination' from exogenous nucleic acids. RESULTS:Here we propose DecontaMiner, a tool to unravel the presence of contaminating sequences among the unmapped reads. It uses a subtraction approach to identify bacteria, fungi and viruses genome contamination. DecontaMiner generates several output files to track all the processed reads, and to provide a complete report of their characteristics. The good quality matches on microorganism genomes are counted and compared among samples. DecontaMiner builds an offline HTML page containing summary statistics and plots. The latter are obtained using the state-of-the-art D3 javascript libraries. DecontaMiner has been mainly used to detect contamination in human RNA-Seq data. The software is freely available at http://www-labgtp.na.icar.cnr.it/decontaminer . CONCLUSIONS:DecontaMiner is a tool designed and developed to investigate the presence of contaminating sequences in unmapped NGS data. It can suggest the presence of contaminating organisms in sequenced samples, that might derive either from laboratory contamination or from their biological source, and in both cases can be considered as worthy of further investigation and experimental validation. The novelty of DecontaMiner is mainly represented by its easy integration with the standard procedures of NGS data analysis, while providing a complete, reliable, and automatic pipeline.

SUBMITTER: Sangiovanni M

PROVIDER: S-EPMC6472186 | biostudies-literature | 2019 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

From trash to treasure: detecting unexpected contamination in unmapped NGS data.

Sangiovanni Mara M Granata Ilaria I Thind Amarinder Singh AS Guarracino Mario Rosario MR

BMC bioinformatics 20190418 Suppl 4

<h4>Background</h4>Next Generation Sequencing (NGS) experiments produce millions of short sequences that, mapped to a reference genome, provide biological insights at genomic, transcriptomic and epigenomic level. Typically the amount of reads that correctly maps to the reference genome ranges between 70% and 90%, leaving in some cases a consistent fraction of unmapped sequences. This 'misalignment' can be ascribed to low quality bases or sequence differences between the sample reads and the refe ...[more]

PMID: 30999839

Dataset Information

From trash to treasure: detecting unexpected contamination in unmapped NGS data.

Publications

From trash to treasure: detecting unexpected contamination in unmapped NGS data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Diverse and widespread contamination evident in the unmapped depths of high throughput sequencing data.
| S-EPMC4213012 | biostudies-literature

The hidden treasure in your data: phasing with unexpected weak anomalous scatterers from routine data sets.
| S-EPMC5379167 | biostudies-literature

Trash to Treasure: Eco-Friendly and Practical Synthesis of Amides by Nitriles Hydrolysis in WEPPA.
| S-EPMC6864965 | biostudies-literature

Cancer-Derived Extracellular Vesicle-Associated MicroRNAs in Intercellular Communication: One Cell's Trash Is Another Cell's Treasure.
| S-EPMC6940802 | biostudies-literature

Detecting sample swaps in diverse NGS data types using linkage disequilibrium.
| S-EPMC7391710 | biostudies-literature

Trash to Treasure: How Insect Protein and Waste Containers Can Improve the Environmental Footprint of Mosquito Egg Releases.
| S-EPMC8950251 | biostudies-literature

One's trash is someone else's treasure: sequence read archives from Lepidoptera genomes provide material for genome reconstruction of their endosymbionts.
| S-EPMC9426245 | biostudies-literature

A random forest classifier for detecting rare variants in NGS data from viral populations.
| S-EPMC5548337 | biostudies-literature

Unexpected cross-species contamination in genome sequencing projects.
| S-EPMC4243333 | biostudies-literature

Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data.
| S-EPMC3487130 | biostudies-literature