Unknown

Dataset Information

0

SeqQscorer: automated quality control of next-generation sequencing data using machine learning.


ABSTRACT: Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal and external functional genomics datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at https://github.com/salbrec/seqQscorer .

SUBMITTER: Albrecht S 

PROVIDER: S-EPMC7934511 | biostudies-literature | 2021 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

seqQscorer: automated quality control of next-generation sequencing data using machine learning.

Albrecht Steffen S   Sprang Maximilian M   Andrade-Navarro Miguel A MA   Fontaine Jean-Fred JF  

Genome biology 20210305 1


Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal and external functional genomics datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models re  ...[more]

Similar Datasets

| S-EPMC4018527 | biostudies-literature
| S-EPMC6938691 | biostudies-literature
| S-EPMC7995270 | biostudies-literature
| S-EPMC4604202 | biostudies-literature
| S-EPMC8610268 | biostudies-literature
| S-EPMC3270013 | biostudies-literature
| S-EPMC4268903 | biostudies-literature
| S-EPMC2707382 | biostudies-literature
| S-EPMC6258556 | biostudies-literature
| S-EPMC7379936 | biostudies-literature