Unknown

Dataset Information

0

Short paired-end reads trump long single-end reads for expression analysis.


ABSTRACT: BACKGROUND:Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious which strategy should be preferred. Using publicly available data, we test whether short-paired end reads can achieve more robust expression estimates and differential expression results than single-end reads of approximately the same total number of sequenced bases. RESULTS:At both the transcript and gene levels, 2?×?40 paired-end reads unequivocally provide expression estimates that are more highly correlated with 2?×?125 than 1?×?75 reads; in nearly all cases, those correlations are also greater than for 1?×?125, despite the greater total number of sequenced bases for the latter. Across an array of metrics, differential expression tests based upon 2?×?40 consistently outperform those using 1?×?75. CONCLUSION:Researchers seeking a cost-effective approach for gene-level expression analysis should prefer short paired-end reads over a longer single-end strategy. Short paired-end reads will also give reasonably robust expression estimates and differential expression results at the isoform level.

SUBMITTER: Freedman AH 

PROVIDER: S-EPMC7168855 | biostudies-literature | 2020 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Short paired-end reads trump long single-end reads for expression analysis.

Freedman Adam H AH   Gaspar John M JM   Sackton Timothy B TB  

BMC bioinformatics 20200419 1


<h4>Background</h4>Typical experimental design advice for expression analyses using RNA-seq generally assumes that single-end reads provide robust gene-level expression estimates in a cost-effective manner, and that the additional benefits obtained from paired-end sequencing are not worth the additional cost. However, in many cases (e.g., with Illumina NextSeq and NovaSeq instruments), shorter paired-end reads and longer single-end reads can be generated for the same cost, and it is not obvious  ...[more]

Similar Datasets

| S-EPMC3614465 | biostudies-other
| S-EPMC3919575 | biostudies-literature
| S-EPMC3158087 | biostudies-literature
| S-EPMC4582294 | biostudies-literature
| S-EPMC3076424 | biostudies-literature
| S-EPMC6280916 | biostudies-literature
| S-EPMC3684618 | biostudies-literature
| S-EPMC5009668 | biostudies-literature
| S-EPMC3530679 | biostudies-literature
| S-EPMC10231564 | biostudies-literature