Unknown

Dataset Information

0

Semi-Supervised Non-Parametric Bayesian Modelling of Spatial Proteomics.


ABSTRACT: Understanding sub-cellular protein localisation is an essential component in the analysis of context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high resolution mapping of thousands of proteins to sub-cellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus necessary. We approach analysis of spatial proteomics data in a non-parametric Bayesian framework, using K-component mixtures of Gaussian process regression models. The Gaussian process regression model accounts for correlation structure within a sub-cellular niche, with each mixture component capturing the distinct correlation structure observed within each niche. The availability of marker proteins (i.e. proteins with a priori known labelled locations) motivates a semi-supervised learning approach to inform the Gaussian process hyperparameters. We moreover provide an efficient Hamiltonian-within-Gibbs sampler for our model. Furthermore, we reduce the computational burden associated with inversion of covariance matrices by exploiting the structure in the covariance matrix. A tensor decomposition of our covariance matrices allows extended Trench and Durbin algorithms to be applied to reduce the computational complexity of inversion and hence accelerate computation. We provide detailed case-studies on Drosophila embryos and mouse pluripotent embryonic stem cells to illustrate the benefit of semi-supervised functional Bayesian modelling of the data.

SUBMITTER: Crook OM 

PROVIDER: S-EPMC7613899 | biostudies-literature | 2022 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Semi-Supervised Non-Parametric Bayesian Modelling of Spatial Proteomics.

Crook Oliver M OM   Lilley Kathryn S KS   Gatto Laurent L   Kirk Paul D W PDW  

The annals of applied statistics 20221201 4


Understanding sub-cellular protein localisation is an essential component in the analysis of context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high resolution mapping of thousands of proteins to sub-cellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus necessary. We approach analysis of spatial proteomics data in a non-parametric Bayesian framework, using K-component mixtures of  ...[more]

Similar Datasets

| S-EPMC8340378 | biostudies-literature
| S-EPMC4111556 | biostudies-literature
| S-EPMC6258510 | biostudies-literature
| S-EPMC10810649 | biostudies-literature
| S-EPMC9035098 | biostudies-literature
| S-EPMC8241860 | biostudies-literature
| S-EPMC10186154 | biostudies-literature
| S-EPMC5963472 | biostudies-literature
| S-EPMC2876132 | biostudies-literature
| S-EPMC9826400 | biostudies-literature