Dataset Information

RNA-Seq data for 35 controls using total RNA extracted using RNAZol

ABSTRACT: Methods routinely used to analyze RNA sequencing data focus on statistical significance and the detection gene differential expression changes that meet a two-fold minimum change between groups. Due to the unique expression variability present in RNA sequencing data, this strategy may potentially overlook or obscure the detection of valuable information as a result of large expression variability in specific genes in certain samples. This paper develops tools and methods that apply variance and dispersion estimates to intra-group data in order to identify genes with expression values that diverge from the group envelope. STRING database analysis of the genes identified with this analysis characterize gene affiliations involved in physiological regulatory networks that are associated to biological variability. Samples or genes identified as divergent can be judiciously evaluated prior to any standard differential analysis. A three-step process is presented for evaluating biological variability within a group in RNA sequencing data in which gene counts were: (1) scaled to minimize heteroscedasticity; (2) rank-ordered to potentially divergent “trendlines” for every gene in the data set; and (3) tested with the STRING database to identify statistically significant pathway associations among the genes displaying marked trendline variability and dispersion. This approach was used to identify and portray the “trendline” profile of every gene in three test data sets. Control data from an in-house data set and two archived samples revealed that 65-70 % of the sequenced genes displayed trendlines with minimal variation and dispersion across the sample group after rank-ordering the samples; this is referred to as linear trendline. Nonlinear trendlines refer to all cases where the trendline is not linear. Smaller subsets of genes within the three data sets displayed markedly skewed trendlines, wide dispersion, and variability. STRING database analysis of these genes identified interferon-mediated response networks in 11-20 % of the individuals sampled at the time of blood collection. For example, in the three control data sets, 14 to 26 genes in the defense response to virus pathway were identified in 7 individuals at false discovery rates ≤ 1.92 E-15. Gene clusters involving leukocyte and neutrophil activation and degranulation pathways were also detected.

ORGANISM(S): Homo sapiens

PROVIDER: GSE169359 | GEO | 2021/03/23

REPOSITORIES: GEO

ACCESS DATA

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Project description:Tistlia consotensis is a halotolerant Rhodospirillaceae that was isolated from a saline spring located in the Colombian Andes with a salt concentration close to seawater (4.5%w/vol). We cultivated this microorganism in three NaCl concentrations, i.e. optimal (0.5%), without (0.0%) and high (4.0%) salt concentration, and analyzed its cellular proteome. For assigning tandem mass spectrometry data, we first sequenced its genome and constructed a six reading frame ORF database from the draft sequence. We annotated only the genes whose products (872) were detected. We compared the quantitative proteome data sets recorded for the three different growth conditions.Peak lists were generated with the MASCOT DAEMON software (version 2.3.2) from Matrix Science using the extract_msn.exe data import filter from the Xcalibur FT package (version 2.0.7) proposed by ThermoFisher. Data import filter options were set at 400 (minimum mass), 5,000 (maximum mass), 0 (grouping tolerance), 0 (intermediate scans), and 1,000 (threshold). MS/MS spectra were searched against the home-made ORF database with the following parameters: tryptic peptides with a maximum of 2 miss cleavages during proteolytic digestion, a mass tolerance of 5 ppm on the parent ion and 0.5 Da on the MS/MS, fixed modification for carbamidomethylated Cys (+57.0215) and variable modification for oxidized Met (+15.9949). All peptide matches with a peptide score above its query threshold set at p < 0.05 with the ORF database and rank 1 were parsed using the IRMa 1.28.0 software. False-positive rate for peptide identification was estimated using a decoy database as below 0.5% with these parameters. MS/MS spectra assigned to several loci were systematically removed. A protein was considered validated when at least two different peptides were detected in the same experiment. False-positive identification of proteins was estimated using a reverse decoy database as below 0.1% with these parameters The number of MS/MS spectra per protein (spectral counts) was determined for the three replicates in each growth condition. The protein abundances were compared using the T-Fold option of the PatternLab 2.0 software. This module allows normalising the spectral count datasets, calculating the average fold changes with statistics (t-test), and estimating the resulting theoretical false discovery rate. Stringent parameters were used in this analysis: minimum fold change of 1.5, minimum p-value of 0.05 and BH-FDR Alfa of 0.15.

Dataset Information

RNA-Seq data for 35 controls using total RNA extracted using RNAZol

Dataset's files

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets