Dataset Information

Using multivariate mixed-effects selection models for analyzing batch-processed proteomics data with non-ignorable missingness.

ABSTRACT: In quantitative proteomics, mass tag labeling techniques have been widely adopted in mass spectrometry experiments. These techniques allow peptides (short amino acid sequences) and proteins from multiple samples of a batch being detected and quantified in a single experiment, and as such greatly improve the efficiency of protein profiling. However, the batch-processing of samples also results in severe batch effects and non-ignorable missing data occurring at the batch level. Motivated by the breast cancer proteomic data from the Clinical Proteomic Tumor Analysis Consortium, in this work, we developed two tailored multivariate MIxed-effects SElection models (mvMISE) to jointly analyze multiple correlated peptides/proteins in labeled proteomics data, considering the batch effects and the non-ignorable missingness. By taking a multivariate approach, we can borrow information across multiple peptides of the same protein or multiple proteins from the same biological pathway, and thus achieve better statistical efficiency and biological interpretation. These two different models account for different correlation structures among a group of peptides or proteins. Specifically, to model multiple peptides from the same protein, we employed a factor-analytic random effects structure to characterize the high and similar correlations among peptides. To model biological dependence among multiple proteins in a functional pathway, we introduced a graphical lasso penalty on the error precision matrix, and implemented an efficient algorithm based on the alternating direction method of multipliers. Simulations demonstrated the advantages of the proposed models. Applying the proposed methods to the motivating data set, we identified phosphoproteins and biological pathways that showed different activity patterns in triple negative breast tumors versus other breast tumors. The proposed methods can also be applied to other high-dimensional multivariate analyses based on clustered data with or without non-ignorable missingness.

SUBMITTER: Wang J

PROVIDER: S-EPMC6797056 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Using multivariate mixed-effects selection models for analyzing batch-processed proteomics data with non-ignorable missingness.

Wang Jiebiao J Wang Pei P Hedeker Donald D Chen Lin S LS

Biostatistics (Oxford, England) 20191001 4

In quantitative proteomics, mass tag labeling techniques have been widely adopted in mass spectrometry experiments. These techniques allow peptides (short amino acid sequences) and proteins from multiple samples of a batch being detected and quantified in a single experiment, and as such greatly improve the efficiency of protein profiling. However, the batch-processing of samples also results in severe batch effects and non-ignorable missing data occurring at the batch level. Motivated by the br ...[more]

PMID: 29939200

Dataset Information

Using multivariate mixed-effects selection models for analyzing batch-processed proteomics data with non-ignorable missingness.

Publications

Using multivariate mixed-effects selection models for analyzing batch-processed proteomics data with non-ignorable missingness.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Antedependence models for nonstationary categorical longitudinal data with ignorable missingness: likelihood-based inference.
| S-EPMC3885186 | biostudies-literature

General location multivariate latent variable models for mixed correlated bounded continuous, ordinal, and nominal responses with non-ignorable missing data.
| S-EPMC9042174 | biostudies-literature

Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates.
| S-EPMC4007313 | biostudies-literature

Semi-parametric methods of handling missing data in mortal cohorts under non-ignorable missingness.
| S-EPMC6481558 | biostudies-literature

Analyzing multiple outcomes in clinical research using multivariate multilevel models.
| S-EPMC4119868 | biostudies-literature

Bayesian latent-class mixed-effect hybrid models for dyadic longitudinal data with non-ignorable dropouts.
| S-EPMC3970927 | biostudies-literature

Multivariate Air Pollution Prediction Modeling with partial Missingness.
| S-EPMC6980235 | biostudies-literature

Multivariate strategy for the sample selection and integration of multi-batch data in metabolomics.
| S-EPMC5570768 | biostudies-literature

Negative Binomial Mixed Models for Analyzing Longitudinal Microbiome Data.
| S-EPMC6070621 | biostudies-literature

Causal inference using multivariate generalized linear mixed-effects models.
| S-EPMC11422711 | biostudies-literature