Dataset Information

Accounting for measurement error in biomarker data and misclassification of subtypes in the analysis of tumor data.

ABSTRACT: A common paradigm in dealing with heterogeneity across tumors in cancer analysis is to cluster the tumors into subtypes using marker data on the tumor, and then to analyze each of the clusters separately. A more specific target is to investigate the association between risk factors and specific subtypes and to use the results for personalized preventive treatment. This task is usually carried out in two steps-clustering and risk factor assessment. However, two sources of measurement error arise in these problems. The first is the measurement error in the biomarker values. The second is the misclassification error when assigning observations to clusters. We consider the case with a specified set of relevant markers and propose a unified single-likelihood approach for normally distributed biomarkers. As an alternative, we consider a two-step procedure with the tumor type misclassification error taken into account in the second-step risk factor analysis. We describe our method for binary data and also for survival analysis data using a modified version of the Cox model. We present asymptotic theory for the proposed estimators. Simulation results indicate that our methods significantly lower the bias with a small price being paid in terms of variance. We present an analysis of breast cancer data from the Nurses' Health Study to demonstrate the utility of our method. Copyright © 2016 John Wiley & Sons, Ltd.

SUBMITTER: Nevo D

PROVIDER: S-EPMC5562152 | biostudies-literature | 2016 Dec

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Accounting for measurement error in biomarker data and misclassification of subtypes in the analysis of tumor data.

Nevo Daniel D Zucker David M DM Tamimi Rulla M RM Wang Molin M

Statistics in medicine 20160824 30

A common paradigm in dealing with heterogeneity across tumors in cancer analysis is to cluster the tumors into subtypes using marker data on the tumor, and then to analyze each of the clusters separately. A more specific target is to investigate the association between risk factors and specific subtypes and to use the results for personalized preventive treatment. This task is usually carried out in two steps-clustering and risk factor assessment. However, two sources of measurement error arise ...[more]

PMID: 27558651

Similar Datasets

Project description:Studies on the effects of air pollution and more generally environmental exposures on health require measurements of pollutants, which are affected by measurement error. This is a cause of bias in the estimation of parameters relevant to the study and can lead to inaccurate conclusions when evaluating associations among pollutants, disease risk and biomarkers. Although the presence of measurement error in such studies has been recognized as a potential problem, it is rarely considered in applications and practical solutions are still lacking. In this work, we formulate Bayesian measurement error models and apply them to study the link between air pollution and omic signals. The data we use stem from the "Oxford Street II Study", a randomized crossover trial in which 60 volunteers walked for two hours in a traffic-free area (Hyde Park) and in a busy shopping street (Oxford Street) of London. Metabolomic measurements were made in each individual as well as air pollution measurements, in order to investigate the association between short-term exposure to traffic related air pollution and perturbation of metabolic pathways. We implemented error-corrected models in a classical framework and used the flexibility of Bayesian hierarchical models to account for dependencies among omic signals, as well as among different pollutants. Models were implemented using traditional Markov Chain Monte Carlo (MCMC) simulative methods as well as integrated Laplace approximation. The inclusion of a classical measurement error term resulted in variable estimates of the association between omic signals and traffic related air pollution measurements, where the direction of the bias was not predictable a priori. The models were successful in including and accounting for different correlation structures, both among omic signals and among different pollutant exposures. In general, more associations were identified when the correlation among omics and among pollutants were modeled, and their number increased when a measurement error term was additionally included in the multivariate models (particularly for the associations between metabolomics and NO2).

Dataset Information

Accounting for measurement error in biomarker data and misclassification of subtypes in the analysis of tumor data.

Publications

Accounting for measurement error in biomarker data and misclassification of subtypes in the analysis of tumor data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets