Dataset Information

Short paired-end reads trump long single-end reads for expression analysis.

ABSTRACT: BACKGROUND:Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases. RESULTS:At both the transcript and gene levels, 2 × 40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2 × 125 than 1 × 75 reads; in nearly all cases, those correlations are also greater than for 1 × 125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2 × 40 consistently outperform those using 1 × 75. CONCLUSION:Researchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.

SUBMITTER: Freedman AH

PROVIDER: S-EPMC7168855 | biostudies-literature | 2020 Apr

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Short paired-end reads trump long single-end reads for expression analysis.

Freedman Adam H AH Gaspar John M JM Sackton Timothy B TB

BMC bioinformatics 20200419 1

<h4>Background</h4>Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious ...[more]

PMID: 32306895

Dataset Information

Short paired-end reads trump long single-end reads for expression analysis.

Publications

Short paired-end reads trump long single-end reads for expression analysis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Accurate indel prediction using paired-end short reads.
| S-EPMC3614465 | biostudies-other

Inferring short tandem repeat variation from paired-end short reads.
| S-EPMC3919575 | biostudies-literature

Meraculous: de novo genome assembly with short paired-end reads.
| S-EPMC3158087 | biostudies-literature

Konnector v2.0: pseudo-long reads from paired-end sequencing data.
| S-EPMC4582294 | biostudies-literature

Local de novo assembly of RAD paired-end contigs using short sequencing reads.
| S-EPMC3076424 | biostudies-literature

De novo finished 2.8 Mbp Staphylococcus aureus genome assembly from 100 bp short and long range paired-end reads.
| S-EPMC6280916 | biostudies-literature

A filtering method to generate high quality short reads using illumina paired-end technology.
| S-EPMC3684618 | biostudies-literature

Short tandem repeat number estimation from paired-end reads for multiple individuals by considering coalescent tree.
| S-EPMC5009668 | biostudies-literature

SOAPindel: efficient identification of indels from short paired reads.
| S-EPMC3530679 | biostudies-literature

Elucidation of genomic organizations of transgenic soybean plants through de novo genome assembly with short paired-end reads.
| S-EPMC10231564 | biostudies-literature