Dataset Information

Guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed RNAseq methods.

ABSTRACT: BACKGROUND:Standard RNAseq methods using bulk RNA and recent single-cell RNAseq methods use DNA barcodes to identify samples and cells, and the barcoded cDNAs are pooled into a library pool before high throughput sequencing. In cases of single-cell and low-input RNAseq methods, the library is further amplified by PCR after the pooling. Preparation of hundreds or more samples for a large study often requires multiple library pools. However, sometimes correlation between expression profiles among the libraries is low and batch effect biases make integration of data between library pools difficult. RESULTS:We investigated 166 technical replicates in 14 RNAseq libraries made using the STRT method. The patterns of the library biases differed by genes, and uneven library yields were associated with library biases. The former bias was corrected using the NBGLM-LBC algorithm, which we present in the current study. The latter bias could not be corrected directly, but could be solved by omitting libraries with particularly low yields. A simulation experiment suggested that the library bias correction using NBGLM-LBC requires a consistent sample layout. The NBGLM-LBC correction method was applied to an expression profile for a cohort study of childhood acute respiratory illness, and the library biases were resolved. CONCLUSIONS:The R source code for the library bias correction named NBGLM-LBC is available at https://shka.github.io/NBGLM-LBC and https://shka.bitbucket.io/NBGLM-LBC . This method is applicable to correct the library biases in various studies that use highly multiplexed sequencing-based profiling methods with a consistent sample layout with samples to be compared (e.g., "cases" and "controls") equally distributed in each library.

SUBMITTER: Katayama S

PROVIDER: S-EPMC6693229 | biostudies-literature | 2019 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed RNAseq methods.

Katayama Shintaro S Skoog Tiina T Söderhäll Cilla C Einarsdottir Elisabet E Krjutškov Kaarel K Kere Juha J

BMC bioinformatics 20190813 1

<h4>Background</h4>Standard RNAseq methods using bulk RNA and recent single-cell RNAseq methods use DNA barcodes to identify samples and cells, and the barcoded cDNAs are pooled into a library pool before high throughput sequencing. In cases of single-cell and low-input RNAseq methods, the library is further amplified by PCR after the pooling. Preparation of hundreds or more samples for a large study often requires multiple library pools. However, sometimes correlation between expression profile ...[more]

PMID: 31409293

Similar Datasets

Project description:Applications of synthetic biology spanning human health, industrial bioproduction, and ecosystem monitoring often require small molecule sensing capabilities, typically in the form of genetically encoded small molecule biosensors. Critical to the deployment of greater numbers of these systems are methods that support the rapid development of such biosensors against a broad range of small molecule targets. Here, we use a previously developed method for selection of RNA biosensors against unmodified small molecules (DRIVER) to perform a selection against a densely multiplexed mixture of small molecules, representative of those employed in high-throughput drug screening. Using a mixture of 5,120 target compounds randomly sampled from a large diversity drug screening library, we performed a 95-round selection and then analyzed the enriched RNA biosensor library using next generation sequencing (NGS). From our analysis, we identified RNA biosensors with at least 2-fold change in signal in the presence of at least 217 distinct target compounds with sensitivities down to 25 nM. Although many of these biosensors respond to multiple targets, clustering analysis indicated at least 150 different small-molecule sensing patterns. We also built a classifier that was able to predict whether the biosensors would respond to a new compound with an average precision of 0.82. Since the target compound library was designed to be representative of larger diversity compound libraries, we expect that the described approach can be used with similar compound libraries to identify aptamers against other small molecules with a similar success rate. The new RNA biosensors (or their component aptamers) described in this work can be further optimized and used in applications such as biosensing, gene control, or enzyme evolution. In addition, the data presented here provide an expanded compendium of new RNA aptamers compared to the 82 small molecule RNA aptamers published in the literature, allowing further bioinformatic analyses of the general classes of small molecules for which RNA aptamers can be found.

Dataset Information

Guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed RNAseq methods.

Publications

Guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed RNAseq methods.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets