MiRTrace quality control of small RNA-Seq data prepared from low-input, degraded and contamianted HEK-293T RNA samples
Ontology highlight
ABSTRACT: miRTrace is a tool for quality control and tracing taxonomic origins of microRNA sequencing data. It operates in two modes: Trace mode, in which the software reports the composition of clade-specific miRNAs; and QC mode, in which it performed an all-round quality control. To validate the QC mode of the software, we subjected the in-house control samples from HEK-293T cells to various treatments, such as cross-species contamiantion with Drosophila S2 RNAs, sample dilution and RNase A digestion. These samples were processed using QC mode of miRTrace. We demonstrate that miRTrace accurately identities poor-quality samples and to some extent even the causes of the compromised quality.
Project description:Microarray technology provides a powerful tool for defining gene expression profiles of airway epithelium that lend insight into the pathogenesis of human airway disorders. The focus of this study was to establish rigorous quality control parameters to ensure that microarray assessment of the airway epithelium is not confounded by experimental artifact. Samples (total n=223) of trachea, large and small airway epithelium were collected by fiberoptic bronchoscopy of 144 individuals (42 healthy non-smokers, 49 healthy smokers, 11 symptomatic smokers, 22 smokers with lone emphysema with normal spirometry, and 20 smokers with COPD) were processed and hybridized to Affymetrix HG-U133 2.0 Plus microarrays. The pre- and post-chip quality control (QC) criteria established, included: (1) RNA quality, assessed by RNA Integrity Number (RIN) ≥7.0 using Agilent 2100 Bioanalyzer software; (2) cRNA transcript integrity, assessed by signal intensity ratio of GAPDH 3' to 5' probe sets ≤3.0; and (3) the multi-chip normalization scaling factor ≤10.0 Of the 223 samples, 213 (95.5%) passed the QC criteria. In a data set of 34 arrays (10 samples failing QC criteria, 24 randomly chosen samples passing QC criteria), correlation coefficients for pairwise comparisons of expression levels for 100 housekeeping genes in which at least one array failed the QC criteria were significantly lower (average Pearson r = 0.90 ± 0.04) and more broadly dispersed than correlation coefficients for pairwise comparisons between any two arrays that passed the QC criteria (average Pearson r = 0.97 ± 0.01). By using the QC cutoff criteria, the inter-array variability, as assessed by the coefficient of variation in the expression levels for 100 housekeeping genes, was reduced from 35.7% to 21.7%. Based on the aberrant housekeeping gene data generated from samples failing the established QC criteria, we propose that the QC criteria outlined in this study can accurately distinguish high quality from low quality data and can be used to delete poor quality microarray samples before proceeding to higher-order biological analyses and interpretation.
Project description:Quality control (QC) in mass spectrometry (MS)-based proteomics is mainly based on data-dependent acquisition (DDA) analysis of standard samples. Here, we collected 2638 files acquired by data independent acquisition (DIA) and paired DDA files from mouse liver digests using 21 mass spectrometers across nine laboratories over 31 months. Our data demonstrated that DIA-based LC-MS/MS-related consensus QC metric exhibit higher sensitivity compared to DDA-based QC metric in detecting changes in LC-MS status. We then optimized 15 metrics and invited 21 experts to manually assess the quality of 2638 DIA files based on those metrics. Based on the annotation results, we developed an AI model for DIA-based QC in the training set of 2110 DIA files. This model predicted the liquid chromatography (LC) performance with an AUC of 0.91 and the MS performance with an AUC of 0.97 in an independent validation dataset (n = 528). Finally, we developed an offline software called iDIA-QC for convenient adoption of this methodology for LC-MS QC.
Project description:MicroRNA array data for 144 Mouse lung tissue RNA samples were processed, out of which, 139 passed the visual Quality Control (QC) and data QC. To determine potential signaling pathways involved with MWCNT-associated pathological changes in comparison to asbestos, we determined up- and down-regulated miRNA expression in lung tissue at 1 year post-exposure.
Project description:Microarray technology provides a powerful tool for defining gene expression profiles of airway epithelium that lend insight into the pathogenesis of human airway disorders. The focus of this study was to establish rigorous quality control parameters to ensure that microarray assessment of the airway epithelium is not confounded by experimental artifact. Samples (total n=223) of trachea, large and small airway epithelium were collected by fiberoptic bronchoscopy of 144 individuals (42 healthy non-smokers, 49 healthy smokers, 11 symptomatic smokers, 22 smokers with lone emphysema with normal spirometry, and 20 smokers with COPD) were processed and hybridized to Affymetrix HG-U133 2.0 Plus microarrays. The pre- and post-chip quality control (QC) criteria established, included: (1) RNA quality, assessed by RNA Integrity Number (RIN) ≥7.0 using Agilent 2100 Bioanalyzer software; (2) cRNA transcript integrity, assessed by signal intensity ratio of GAPDH 3' to 5' probe sets ≤3.0; and (3) the multi-chip normalization scaling factor ≤10.0 Of the 223 samples, 213 (95.5%) passed the QC criteria. In a data set of 34 arrays (10 samples failing QC criteria, 24 randomly chosen samples passing QC criteria), correlation coefficients for pairwise comparisons of expression levels for 100 housekeeping genes in which at least one array failed the QC criteria were significantly lower (average Pearson r = 0.90 ± 0.04) and more broadly dispersed than correlation coefficients for pairwise comparisons between any two arrays that passed the QC criteria (average Pearson r = 0.97 ± 0.01). By using the QC cutoff criteria, the inter-array variability, as assessed by the coefficient of variation in the expression levels for 100 housekeeping genes, was reduced from 35.7% to 21.7%. Based on the aberrant housekeeping gene data generated from samples failing the established QC criteria, we propose that the QC criteria outlined in this study can accurately distinguish high quality from low quality data and can be used to delete poor quality microarray samples before proceeding to higher-order biological analyses and interpretation. Affymetrix arrays were used to assess the quality of gene expression data in trachea, large airway and small airway epithelium obtained by fiberoptic bronchoscopy of 42 healthy non-smokers, 49 healthy smokers, 11 symptomatic smokers, 22 smokers with lone emphysema with normal spirometry, and 20 smokers with COPD.
Project description:Formalin-fixed, paraffin-embedded (FFPE) tissues have many advantages for identification of risk biomarkers, including wide availability and potential for extended follow-up endpoints. However, RNA derived from archival FFPE samples has limited quality. Here we identified parameters that determine which FFPE samples have the potential for successful RNA extraction, library preparation, and generation of usable RNAseq data. We optimized library preparation protocols designed for use with FFPE samples using seven FFPE and Fresh Frozen replicate pairs, and tested optimized protocols using a study set of 130 FFPE biopsies from women with benign breast disease. Metrics from RNA extraction and preparation procedures were collected and compared with bioinformatics sequencing summary statistics. Finally, a decision tree model was built to learn the relationship between pre-sequencing lab metrics and qc pass/fail status as determined by bioinformatics metrics.. Samples that failed bioinformatics qc tended to have low median sample-wise correlation within the cohort (Spearman correlation < 0.75), low number of reads mapped to gene regions (< 25 million), or low number of detectable genes (11,400 # of detected genes with TPM > 4). The median RNA concentration and pre-capture library Qubit values for qc failed samples were 18.9 ng/ul and 2.08 ng/ul respectively, which were significantly lower than those of qc pass samples (40.8 ng/ul and 5.82 ng/ul). We built a decision tree model based on input RNA concentration, input library qubit values, and achieved an F score of 0.848 in predicting QC status (pass/fail) of FFPE samples. We provide a bioinformatics quality control recommendation for FFPE samples from breast tissue by evaluating bioinformatic and sample metrics. Our results suggest a minimum concentration of 25 ng/ul FFPE-extracted RNA for library preparation and 1.7 ng/ul pre-capture library output to achieve adequate RNA-seq data for downstream bioinformatics analysis.
Project description:Every laboratory performing mass spectrometry based proteomics strives to generate high quality data. Among the many factors that influence the outcome of any experiment in proteomics is performance of the LC-MS system, which should be monitored continuously. This process is termed quality control (QC). We present an easy to use, rapid tool, which produces a visual, HTML based report that includes the key parameters needed to monitor LC-MS system perfromance. The tool, named RawBeans, can generate a report for individual files, or for a set of samples from a whole experiment. We anticipate it will help proteomics users and experts evaluate raw data quality, independent of data processing. The tool is available here: https://bitbucket.org/incpm/prot-qc/downloads.
Project description:Total RNA extracted from 15 knocking down treated 293T cells using siRNAs targeting transcription splicing factors and sequenced using Illumina Hiseq PE150 platform to generate RNA sequencing with 150bp in read length. Nearly 50 million raw reads were yielded from each sample respectively. We used FastQC to confirm the quality of raw fastq sequencing data, and Ericscript software to detect fusion transcripts.
Project description:In this study, we aim to present a global view of transcriptome dynamics in different rice cultivars (IR64, Nagina 22 and Pokkali) under control and stress conditions. More than 50 million high quality reads were obtained for each tissue sample using Illumina platform. Reference-based assembly was performed for each rice cultivar. The transcriptome dynamics was studied by differential gene expression analyses between stress treatment and control sample. We collected seedlings of three rice cultivars subjected to control (kept in water), desiccation (transferred on folds of tissue paper) and salinity (transferred to beaker containing 200 mM NaCl solution) treatments. Total RNA isolated from these tissue samples was subjected to Illumina sequencing. The sequence data was further filtered using NGS QC Toolkit to obtain high-quality reads. The filtered reads were mapped to Japonica reference genome using Tophat software. Cufflinks was used for reference-based assembly and differential gene expression was studied using cuffdiff software. The differentially expressed genes during various abiotic stress conditions were identified.
Project description:Spatially resolved transcriptomics has enabled precise genome-wide mRNA expression profiling within tissue sections. The performance of unbiased SRT methods targeting the polyA tail of mRNA, relies on the availability of specimens with high RNA quality. Moreover, the high cost of currently available SRT assays requires a careful sample screening process to increase the chance of obtaining high-quality data. Indeed, the upfront analysis of RNA quality can show considerable variability due to sample handling, storage, and/or intrinsic factors. We present RNA-Rescue Spatial Transcriptomics (RRST), an SRT workflow designed to improve mRNA recovery from fresh frozen specimens with moderate to low RNA quality. First, we provide a benchmark of RRST against the standard Visium spatial gene expression protocol on high RNA quality samples represented by mouse brain and prostate cancer samples. Then, we demonstrate the RRST protocol on tissue sections collected from five challenging tissue types, including: human lung, colon, small intestine, pediatric brain tumor, and mouse bone/cartilage. In total, we analyzed 52 tissue sections and our results demonstrate that RRST is a versatile, powerful, and reproducible protocol for FF specimens of different qualities and origins.