Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Preserving biological heterogeneity with personalized genomics batch correction

ABSTRACT: Motivation: Sample source, procurement process, and other technical variations introduce batch effects into genomics data. Algorithms to remove these artifacts enhance differences between known biological covariates, but also carry potential concern of removing intra-group biological heterogeneity and thus any personalized genomic signatures. As a result, accurate identification of novel subtypes from batch corrected genomics data is challenging using standard algorithms designed to remove batch effects for class comparison analyses. Nor can batch effects be corrected reliably in future applications of genomics-based clinical tests, in which the biological groups are by definition unknown a priori. Results: Therefore, we introduce new algorithm, personalized-SVA (pSVA), blind to biological covariates corrected technical artifacts while retaining biological heterogeneity in genomic data. This algorithm facilitated accurate subtype identification in head and neck cancer from gene expression data in both formalin fixed and frozen samples. When applied to predict HPV status, pSVA improved cross- study validation even if the sample batches were highly confounded with HPV status in the training set. Availability: All analyses were performed using R version 2.15.0. The code and data used to generate the results of this manuscript is available from https://sourceforge.net/projects/psva.

ORGANISM(S): Homo sapiens

PROVIDER: GSE53355 | GEO | 2016/12/12

SECONDARY ACCESSION(S): PRJNA231787

REPOSITORIES: GEO

ACCESS DATA

Json Xml

Dataset's files

Source:

			Action	DRS
		Other

Items per page:

1 - 1 of 1

Similar Datasets

Homo sapiens

Project description:Preserving biological heterogeneity with personalized genomics batch correction

| PRJNA231787 | ENA

Clinical and genomic characterization of chemoradiation resistant HPV-positive squamous cell carcinoma

Project description:This study performed genomic sequencing on a rare phenotype of HPV-positive oropharyngeal squamous cell carcinoma which was resistant to standard of care platinum based chemoradiation treatment. Tissue was collected from archival FFPE when available from pretreatment and post treamtent samples from four patients and these were batch corrected to be compared to known HPV-positive tumors without recurrent disease

2024-02-23 | GSE256047 | GEO

Annotation- and Batch Effect Correction in TCGA IsomiR Expression Data [Third-party re-analysis]

Project description:The Cancer Genome Atlas (TCGA) Isoform Expression Quantification Data is the largest ressource of isomiR level sequenced cancer data publicly available. Since the datasets were built up over years and through different contributing institutions, it is not free of batch effects. We evaluated different batch correction approaches to remove batch effects in the data, details of the best performing algorithm and batch variables are included in the supplementary file. Additionally, annotation of the chromosomal end position of each isomiR feature was corrected by the offset of 1 to account for exclusive annotation.

2021-02-17 | GSE164767 | GEO

Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control

Project description:Direct-infusion mass spectrometry (DIMS) metabolomics is an important approach for characterising molecular responses of organisms to disease, drugs and the environment. Increasingly large-scale metabolomics studies are being conducted, necessitating improvements in both bioanalytical and computational workflows to maintain data quality. This dataset represents a systematic evaluation of the reproducibility of a multi-batch DIMS metabolomics study of cardiac tissue extracts. It comprises of twenty biological samples (cow vs. sheep) that were analysed repeatedly, in 8 batches across 7 days, together with a concurrent set of quality control (QC) samples. Data are presented from each step of the workflow and are available in MetaboLights. The strength of the dataset is that intra- and inter-batch variation can be corrected using QC spectra and the quality of this correction assessed independently using the repeatedly-measured biological samples. Originally designed to test the efficacy of a batch-correction algorithm, it will enable others to evaluate novel data processing algorithms. Furthermore, this dataset serves as a benchmark for DIMS metabolomics, derived using best-practice workflows and rigorous quality assessment.

2022-05-16 | MTBLS79 | MetaboLights

Association of microRNAs with CD68 and NOS2 in IBD tissues.

Project description:We analyzed the association of CD68 and NOS2 mRNA expression with microRNAs in IBD tissues. For this analysis, data was LOESS normalized in R. Data was then imported into Partek Genomics Suite and Batch effects caused by day to day variability (Dategroup of array charateristic) were corrected. We did not have mRNA expression data for CD68 and NOS2 for all of the samples, therefore those samples were not used for analysis. We identified several microRNAs associated with CD68 and NOS2 expression.

2012-08-01 | GSE29703 | GEO

Normalised mRNA expressions

Project description:Quantile-normalised and batch corrected

| EGAD00010001561 | EGA

A comprehensive comparison of differential accessibility analysis methods for ATAC-seq data

Project description:Background: ATAC-seq is widely used to measure the chromatin accessibility and identify the open chromatin regions (OCRs). OCRs usually indicate the active regulatory elements in the genome and are directly associated with gene regulatory networks. Identification of differential accessibility regions (DARs) between different biological conditions is critical to measure the differential activity of regulatory elements. Differential analysis of ATAC-seq shares many similarities to differential expression analysis of RNA-seq data. However, the distribution of ATAC-seq signal is different from RNA-seq data, and higher sensitivity is desired for DARs identification. Many different tools can be used to perform differential analysis of ATAC-seq data, but a comprehensive comparison and benchmarking of these methods is still missing. Methods: Here, we used simulated datasets to systematically measure the sensitivity and specificity of 6 different methods. We further discussed the statistical and signal density cutoff in the differential analysis of ATAC-seq by applying to real data. Batch-effect is very common in high-throughput sequencing experiments. Results: We illustrated that batch-effect correction can dramatically improve the sensitivity in differential analysis of ATAC-seq data. Finally, we developed an easily usable package, BeCorrect, to perform batch-effort correction for visualizing corrected ATAC-seq signals on a genome browser. Conclusions: It is important to use PCA to check the samples distribution, and the Remove Unwanted Variation strategy can be used to correct the data to improve the sensitivity when strong batch effects are found in the data. Finally, BeCorrect can be used to correct the batch-effect of ATAC-seq data signal based on DARs analysis, and generate a proper visualization on a genome browser.

2020-06-29 | GSE131144 | GEO

Metabolomics approach to understand the resource partitioning in Chlorella during growth

Project description:We use untargeted high-resolution mass spectrometry to understand the metabolic differences at exponential and stationary growth stages using 22 Chlorella strains collected from South East Asia. Using this data, we demonstrate the use of a filtering procedure based on the SVD on the untargeted metabolite profile data, compared over two growth stages and run in four batches, and to remove structure in data related to day of sample assay. This approach preserves signal of demonstrable biological origin (strain-related variation in mass feature intensity) after minimizing the influence of batch effects. Our approach will be broadly applicable in metabolomics analysis to identify and remove batch effects. See MTBLS129 for associated study.

2016-10-03 | MTBLS193 | MetaboLights

Subtypes of HPV-positive head and neck cancers are associated with HPV characteristics, copy number variations, PIK3CA mutation, and pathway signatures. [SNP]

Project description:Purpose: There is substantial heterogeneity within the human papillomavirus (HPV) positive head and neck cancer (HNC) tumors that predispose them to different outcomes, however this subgroup is poorly characterized due to various historical reasons. Experimental Design: we perform unsupervised gene expression clustering on well-annotated HPV(+) HNC samples from two cohorts ( 84 total primary tumors), as well as 18 HPV(-) HNCs, to discover subtypes, and begin to characterize the differences between the subtypes in terms of their HPV characteristics, pathway activity, whole-genome somatic copy number variations and mutation frequencies. Results: We identified two distinctive HPV(+) subtypes by unsupervised clustering. Membership in the HPV(+) subtypes correlates with genic viral integration status, E2/E4/E5 expression levels and the ratio of spliced to full length HPV oncogene E6. The subtypes also show differences in copy number alterations, in particular the loss of chr16q and gain of chr3q, PIK3CA mutation, and in the expression of genes involved in several biological processes related to cancer, including immune response, oxidation-reduction process, and keratinocyte and mesenchymal differentiation. Conclusion: Our characterization of two subtypes of HPV(+) tumors provides valuable molecular level information in relation to the alternative paths to tumor development and to that of HPV(-) HNCs. Together, these results will shed light on stratifications of the HPV(+) HNCs and will help to guide personalized care for HPV(+) HNC patients.

2016-05-13 | GSE74949 | GEO

Subtypes of HPV-positive head and neck cancers are associated with HPV characteristics, copy number variations, PIK3CA mutation, and pathway signatures. [RNA-Seq]

2016-05-13 | GSE74927 | GEO