Dataset Information

Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences.

ABSTRACT:

Background

The reproducibility of transcriptomic biomarkers across datasets remains poor, limiting clinical application. We and others have suggested that this is in-part caused by differential error-structure between datasets, and their incomplete removal by pre-processing algorithms.

Methods

To test this hypothesis, we systematically assessed the effects of pre-processing on biomarker classification using 24 different pre-processing methods and 15 distinct signatures of tumour hypoxia in 10 datasets (2,143 patients).

Results

We confirm strong pre-processing effects for all datasets and signatures, and find that these differ between microarray versions. Importantly, exploiting different pre-processing techniques in an ensemble technique improved classification for a majority of signatures.

Conclusions

Assessing biomarkers using an ensemble of pre-processing techniques shows clear value across multiple diseases, datasets and biomarkers. Importantly, ensemble classification improves biomarkers with initially good results but does not result in spuriously improved performance for poor biomarkers. While further research is required, this approach has the potential to become a standard for transcriptomic biomarkers.

SUBMITTER: Fox NS

PROVIDER: S-EPMC4061774 | biostudies-literature | 2014 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences.

Fox Natalie S NS Starmans Maud H W MH Haider Syed S Lambin Philippe P Boutros Paul C PC

BMC bioinformatics 20140606

<h4>Background</h4>The reproducibility of transcriptomic biomarkers across datasets remains poor, limiting clinical application. We and others have suggested that this is in-part caused by differential error-structure between datasets, and their incomplete removal by pre-processing algorithms.<h4>Methods</h4>To test this hypothesis, we systematically assessed the effects of pre-processing on biomarker classification using 24 different pre-processing methods and 15 distinct signatures of tumour h ...[more]

PMID: 24902696

Similar Datasets

Project description:Freshwater can support the survival of the enteric pathogen Salmonella, though temporal Salmonella diversity in a large watershed has not been assessed. At 28 locations within the Susquehanna River basin, 10-liter samples were assessed in spring and summer over 2 years. Salmonella prevalence was 49%, and increased river discharge was the main driver of Salmonella presence. The amplicon-based sequencing tool, CRISPR-SeroSeq, was used to determine serovar population diversity and detected 25 different Salmonella serovars, including up to 10 serovars from a single water sample. On average, there were three serovars per sample, and 80% of Salmonella-positive samples contained more than one serovar. Serovars Give, Typhimurium, Thompson, and Infantis were identified throughout the watershed and over multiple collections. Seasonal differences were evident: serovar Give was abundant in the spring, whereas serovar Infantis was more frequently identified in the summer. Eight of the ten serovars most commonly associated with human illness were detected in this study. Crucially, six of these serovars often existed in the background, where they were masked by a more abundant serovar(s) in a sample. Serovars Enteritidis and Typhimurium, especially, were masked in 71 and 78% of samples where they were detected, respectively. Whole-genome sequencing-based phylogeny demonstrated that strains within the same serovar collected throughout the watershed were also very diverse. The Susquehanna River basin is the largest system where Salmonella prevalence and serovar diversity have been temporally and spatially investigated, and this study reveals an extraordinary level of inter- and intraserovar diversity.IMPORTANCESalmonella is a leading cause of bacterial foodborne illness in the United States, and outbreaks linked to fresh produce are increasing. Understanding Salmonella ecology in freshwater is of importance, especially where irrigation practices or recreational use occur. As the third largest river in the United States east of the Mississippi, the Susquehanna River is the largest freshwater contributor to the Chesapeake Bay, and it is the largest river system where Salmonella diversity has been studied. Rainfall and subsequent high river discharge rates were the greatest indicators of Salmonella presence in the Susquehanna and its tributaries. Several Salmonella serovars were identified, including eight commonly associated with foodborne illness. Many clinically important serovars were present at a low frequency within individual samples and so could not be detected by conventional culture methods. The technologies employed here reveal an average of three serovars in a 10-liter sample of water and up to 10 serovars in a single sample.

Dataset Information

Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences.

Background

Methods

Results

Conclusions

Publications

Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets