Unknown

Dataset Information

0

CoCo: RNA-seq read assignment correction for nested genes and multimapped reads.


ABSTRACT: MOTIVATION:Next-generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy and gene coverage. RESULTS:Here we present count corrector (CoCo), a read assignment pipeline that takes into account the multitude of overlapping and repetitive genes in the transcriptome of higher eukaryotes. CoCo uses a modified annotation file that highlights nested genes and proportionally distributes multimapped reads between repeated sequences. CoCo salvages over 15% of discarded aligned RNA-seq reads and significantly changes the abundance estimates for both coding and non-coding RNA as validated by PCR and bedgraph comparisons. AVAILABILITY AND IMPLEMENTATION:The CoCo software is an open source package written in Python and available from http://gitlabscottgroup.med.usherbrooke.ca/scott-group/coco. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Deschamps-Francoeur G 

PROVIDER: S-EPMC6901076 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

CoCo: RNA-seq read assignment correction for nested genes and multimapped reads.

Deschamps-Francoeur Gabrielle G   Boivin Vincent V   Abou Elela Sherif S   Scott Michelle S MS  

Bioinformatics (Oxford, England) 20191201 23


<h4>Motivation</h4>Next-generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy and gene coverage.<h4>Results</h4>Here we present count corrector (CoCo), a read assignment pipeline that takes into account the multitude of overlapping and repetitive genes in the transcripto  ...[more]

Similar Datasets

| S-EPMC4615873 | biostudies-literature
| S-EPMC7671312 | biostudies-literature
| S-EPMC5550947 | biostudies-other
| S-EPMC4064318 | biostudies-literature
| S-EPMC9341511 | biostudies-literature
| S-EPMC10893331 | biostudies-literature
| S-EPMC6842140 | biostudies-literature
| S-EPMC6701478 | biostudies-literature
| S-EPMC7370890 | biostudies-literature
| S-EPMC11329654 | biostudies-literature