Unknown

Dataset Information

0

Evaluating the impact of sequencing error correction for RNA-seq data with ERCC RNA spike-in controls.


ABSTRACT: Sequencing errors are a major issue for several next-generation sequencing-based applications such as de novo assembly and single nucleotide polymorphism detection. Several error-correction methods have been developed to improve raw data quality. However, error-correction performance is hard to evaluate because of the lack of a ground truth. In this study, we propose a novel approach which using ERCC RNA spike-in controls as the ground truth to facilitate error-correction performance evaluation. After aligning raw and corrected RNA-seq data, we characterized the quality of reads by three metrics: mismatch patterns (i.e., the substitution rate of A to C) of reads aligned with one mismatch, mismatch patterns of reads aligned with two mismatches and the percentage increase of reads aligned to reference. We observed that the mismatch patterns for reads aligned with one mismatch are significantly correlated between ERCC spike-ins and real RNA samples. Based on such observations, we conclude that ERCC spike-ins can serve as ground truths for error correction beyond their previous applications for validation of dynamic range and fold-change response. Also, the mismatch patterns for ERCC reads aligned with one mismatch can serve as a novel and reliable metric to evaluate the performance of error-correction tools.

SUBMITTER: Tong L 

PROVIDER: S-EPMC4983418 | biostudies-literature | 2016 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Evaluating the impact of sequencing error correction for RNA-seq data with ERCC RNA spike-in controls.

Tong Li L   Yang Cheng C   Wu Po-Yen PY   Wang May D MD  

... IEEE-EMBS International Conference on Biomedical and Health Informatics. IEEE-EMBS International Conference on Biomedical and Health Informatics 20160201


Sequencing errors are a major issue for several next-generation sequencing-based applications such as de novo assembly and single nucleotide polymorphism detection. Several error-correction methods have been developed to improve raw data quality. However, error-correction performance is hard to evaluate because of the lack of a ground truth. In this study, we propose a novel approach which using ERCC RNA spike-in controls as the ground truth to facilitate error-correction performance evaluation.  ...[more]

Similar Datasets

| S-EPMC3664804 | biostudies-literature
| S-EPMC4615873 | biostudies-literature
2010-04-18 | E-GEOD-20580 | biostudies-arrayexpress
| S-EPMC3228814 | biostudies-other
| S-EPMC3879328 | biostudies-literature
2015-05-20 | E-GEOD-66694 | biostudies-arrayexpress
| S-EPMC3409179 | biostudies-literature
2015-05-20 | GSE66694 | GEO
| S-EPMC4248469 | biostudies-literature
| S-EPMC4937190 | biostudies-literature