Unknown

Dataset Information

0

Curated compendium of human transcriptional biomarker data.


ABSTRACT: One important use of genome-wide transcriptional profiles is to identify relationships between transcription levels and patient outcomes. These translational insights can guide the development of biomarkers for clinical application. Data from thousands of translational-biomarker studies have been deposited in public repositories, enabling reuse. However, data-reuse efforts require considerable time and expertise because transcriptional data are generated using heterogeneous profiling technologies, preprocessed using diverse normalization procedures, and annotated in non-standard ways. To address this problem, we curated 45 publicly available, translational-biomarker datasets from a variety of human diseases. To increase the data's utility, we reprocessed the raw expression data using a uniform computational pipeline, addressed quality-control problems, mapped the clinical annotations to a controlled vocabulary, and prepared consistently structured, analysis-ready data files. These data, along with scripts we used to prepare the data, are available in a public repository. We believe these data will be particularly useful to researchers seeking to perform benchmarking studies-for example, to compare and optimize machine-learning algorithms' ability to predict biomedical outcomes.

SUBMITTER: Golightly NP 

PROVIDER: S-EPMC5903354 | biostudies-literature | 2018 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Curated compendium of human transcriptional biomarker data.

Golightly Nathan P NP   Bell Avery A   Bischoff Anna I AI   Hollingsworth Parker D PD   Piccolo Stephen R SR  

Scientific data 20180417


One important use of genome-wide transcriptional profiles is to identify relationships between transcription levels and patient outcomes. These translational insights can guide the development of biomarkers for clinical application. Data from thousands of translational-biomarker studies have been deposited in public repositories, enabling reuse. However, data-reuse efforts require considerable time and expertise because transcriptional data are generated using heterogeneous profiling technologie  ...[more]

Similar Datasets

| S-EPMC8728275 | biostudies-literature
| S-EPMC10858950 | biostudies-literature
| S-EPMC4856112 | biostudies-literature
| S-EPMC6137171 | biostudies-literature
| S-EPMC9526701 | biostudies-literature
| S-EPMC4021103 | biostudies-literature
| S-EPMC2248174 | biostudies-literature
| S-EPMC5320065 | biostudies-literature
| S-EPMC10987564 | biostudies-literature
| S-EPMC9217108 | biostudies-literature