Unknown

Dataset Information

0

Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome.


ABSTRACT: Quality control for next generation sequencing (NGS) has become increasingly important with the ever increasing importance of sequencing data for omics studies. Tools have been developed for filtering possible contaminants from species with known reference genome. Unfortunately, reference genomes for all the species involved, including the contaminants, are required for these tools to work. This precludes many real-life samples that have no information about the complete genome of the target species, and are contaminated with unknown microbial species. In this work we proposed QC-Blind, a novel quality control pipeline for removing contaminants without any use of reference genomes. The pipeline merely requires the information about a few marker genes of the target species. The entire pipeline consists of unsupervised read assembly, contig binning, read clustering, and marker gene assignment. When evaluated on in silico, ab initio and in vivo datasets, QC-Blind proved effective in removing unknown contaminants with high specificity and accuracy, while preserving most of the genomic information of the target bacterial species. Therefore, QC-Blind could serve well in situations where limited information is available for both target and contamination species.

SUBMITTER: Xi W 

PROVIDER: S-EPMC6637319 | biostudies-literature | 2019

REPOSITORIES: biostudies-literature

altmetric image

Publications

Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome.

Xi Wang W   Gao Yan Y   Cheng Zhangyu Z   Chen Chaoyun C   Han Maozhen M   Yang Pengshuo P   Xiong Guangzhou G   Ning Kang K  

Frontiers in microbiology 20190709


Quality control for next generation sequencing (NGS) has become increasingly important with the ever increasing importance of sequencing data for omics studies. Tools have been developed for filtering possible contaminants from species with known reference genome. Unfortunately, reference genomes for all the species involved, including the contaminants, are required for these tools to work. This precludes many real-life samples that have no information about the complete genome of the target spe  ...[more]

Similar Datasets

| S-EPMC4018527 | biostudies-literature
| S-EPMC3270013 | biostudies-literature
| S-EPMC10264898 | biostudies-literature
| S-EPMC5829578 | biostudies-literature
| PRJEB5201 | ENA
2011-10-10 | PRD000373 | Pride
| S-EPMC8530316 | biostudies-literature
2015-11-04 | PXD003134 | Pride
| S-EPMC5813327 | biostudies-literature
| S-EPMC2707382 | biostudies-literature