Dataset Information

Integrative factorization of bidimensionally linked matrices.

ABSTRACT: Advances in molecular "omics" technologies have motivated new methodologies for the integration of multiple sources of high-content biomedical data. However, most statistical methods for integrating multiple data matrices only consider data shared vertically (one cohort on multiple platforms) or horizontally (different cohorts on a single platform). This is limiting for data that take the form of bidimensionally linked matrices (eg, multiple cohorts measured on multiple platforms), which are increasingly common in large-scale biomedical studies. In this paper, we propose bidimensional integrative factorization (BIDIFAC) for integrative dimension reduction and signal approximation of bidimensionally linked data matrices. Our method factorizes data into (a) globally shared, (b) row-shared, (c) column-shared, and (d) single-matrix structural components, facilitating the investigation of shared and unique patterns of variability. For estimation, we use a penalized objective function that extends the nuclear norm penalization for a single matrix. As an alternative to the complicated rank selection problem, we use results from the random matrix theory to choose tuning parameters. We apply our method to integrate two genomics platforms (messenger RNA and microRNA expression) across two sample cohorts (tumor samples and normal tissue samples) using the breast cancer data from the Cancer Genome Atlas. We provide R code for fitting BIDIFAC, imputing missing values, and generating simulated data.

SUBMITTER: Park JY

PROVIDER: S-EPMC7036334 | biostudies-literature | 2020 Mar

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Integrative factorization of bidimensionally linked matrices.

Park Jun Young JY Lock Eric F EF

Biometrics 20191110 1

Advances in molecular "omics" technologies have motivated new methodologies for the integration of multiple sources of high-content biomedical data. However, most statistical methods for integrating multiple data matrices only consider data shared vertically (one cohort on multiple platforms) or horizontally (different cohorts on a single platform). This is limiting for data that take the form of bidimensionally linked matrices (eg, multiple cohorts measured on multiple platforms), which are inc ...[more]

PMID: 31444786

Similar Datasets

Project description:MotivationRNA binding proteins (RBPs) play important roles in post-transcriptional control of gene expression, including splicing, transport, polyadenylation and RNA stability. To model protein-RNA interactions by considering all available sources of information, it is necessary to integrate the rapidly growing RBP experimental data with the latest genome annotation, gene function, RNA sequence and structure. Such integration is possible by matrix factorization, where current approaches have an undesired tendency to identify only a small number of the strongest patterns with overlapping features. Because protein-RNA interactions are orchestrated by multiple factors, methods that identify discriminative patterns of varying strengths are needed.ResultsWe have developed an integrative orthogonality-regularized nonnegative matrix factorization (iONMF) to integrate multiple data sources and discover non-overlapping, class-specific RNA binding patterns of varying strengths. The orthogonality constraint halves the effective size of the factor model and outperforms other NMF models in predicting RBP interaction sites on RNA. We have integrated the largest data compendium to date, which includes 31 CLIP experiments on 19 RBPs involved in splicing (such as hnRNPs, U2AF2, ELAVL1, TDP-43 and FUS) and processing of 3'UTR (Ago, IGF2BP). We show that the integration of multiple data sources improves the predictive accuracy of retrieval of RNA binding sites. In our study the key predictive factors of protein-RNA interactions were the position of RNA structure and sequence motifs, RBP co-binding and gene region type. We report on a number of protein-specific patterns, many of which are consistent with experimentally determined properties of RBPs.Availability and implementationThe iONMF implementation and example datasets are available at https://github.com/mstrazar/ionmfContact: tomaz.curk@fri.uni-lj.siSupplementary informationSupplementary data are available at Bioinformatics online.

Dataset Information

Integrative factorization of bidimensionally linked matrices.

Publications

Integrative factorization of bidimensionally linked matrices.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets