Dataset Information

CoCo: RNA-seq read assignment correction for nested genes and multimapped reads.

ABSTRACT: MOTIVATION:Next-generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy and gene coverage. RESULTS:Here we present count corrector (CoCo), a read assignment pipeline that takes into account the multitude of overlapping and repetitive genes in the transcriptome of higher eukaryotes. CoCo uses a modified annotation file that highlights nested genes and proportionally distributes multimapped reads between repeated sequences. CoCo salvages over 15% of discarded aligned RNA-seq reads and significantly changes the abundance estimates for both coding and non-coding RNA as validated by PCR and bedgraph comparisons. AVAILABILITY AND IMPLEMENTATION:The CoCo software is an open source package written in Python and available from http://gitlabscottgroup.med.usherbrooke.ca/scott-group/coco. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.

SUBMITTER: Deschamps-Francoeur G

PROVIDER: S-EPMC6901076 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

CoCo: RNA-seq read assignment correction for nested genes and multimapped reads.

Deschamps-Francoeur Gabrielle G Boivin Vincent V Abou Elela Sherif S Scott Michelle S MS

Bioinformatics (Oxford, England) 20191201 23

<h4>Motivation</h4>Next-generation sequencing techniques revolutionized the study of RNA expression by permitting whole transcriptome analysis. However, sequencing reads generated from nested and multi-copy genes are often either misassigned or discarded, which greatly reduces both quantification accuracy and gene coverage.<h4>Results</h4>Here we present count corrector (CoCo), a read assignment pipeline that takes into account the multitude of overlapping and repetitive genes in the transcripto ...[more]

PMID: 31141144

Dataset Information

CoCo: RNA-seq read assignment correction for nested genes and multimapped reads.

Publications

CoCo: RNA-seq read assignment correction for nested genes and multimapped reads.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.
| S-EPMC4615873 | biostudies-literature

Genotyping-free parentage assignment using RAD-seq reads.
| S-EPMC7391307 | biostudies-literature

Read trimming is not required for mapping and quantification of RNA-seq reads at the gene level.
| S-EPMC7671312 | biostudies-literature

kakapo: easy extraction and annotation of genes from raw RNA-seq reads.
| S-EPMC10688300 | biostudies-literature

Local sequence and sequencing depth dependent accuracy of RNA-seq reads.
| S-EPMC5550947 | biostudies-other

HLA typing from RNA-Seq sequence reads.
| S-EPMC4064318 | biostudies-literature

TransMeta simultaneously assembles multisample RNA-seq reads.
| S-EPMC9341511 | biostudies-literature

Three Rounds of Read Correction Significantly Improve Eukaryotic Protein Detection in ONT Reads.
| S-EPMC10893331 | biostudies-literature

VARUS: sampling complementary RNA reads from the sequence read archive.
| S-EPMC6842140 | biostudies-literature

Targeted variant detection using unaligned RNA-Seq reads.
| S-EPMC6701478 | biostudies-literature