Network-based integrative clustering of multiple types of genomic data using non-negative matrix factorization.
Ontology highlight
ABSTRACT: Identification of novel molecular subtypes of disease using multi-source 'omics data is an active area of on-going research. Integrative clustering is a powerful approach to identify latent subtype structure inherent in the data sets accounting for both between and within data correlations. We propose a new integrative network-based clustering method using the non-negative matrix factorization, nNMF, for clustering multiple types of interrelated datasets assayed on same tumor-samples. nNMF utilizes the consensus matrices generated using the non-negative matrix factorization (NMF) algorithm on each type of data as networks among the patient samples. The multiple networks are then combined, and a comprehensive network structure is created optimizing the strengths of the relationships. A spectral clustering algorithm is then used on the final network data to determine the cluster groups. nNMF is a non-parametric method and therefore prior assumptions on the statistical distribution of data is not required. The application of the proposed nNMF method has been provided with simulated and the real-life datasets obtained from The Cancer Genome Atlas studies on glioblastoma, lower grade glioma and head and neck cancer. nNMF was found to be working competitively with previous methods and sometimes better as compared to previous NMF or model-based method especially when the signal to noise ratio is small. The novel nNMF method allows researchers to utilize such relationships to identify the latent subtype structure inherent in the data so that further association studies can be carried out. The R program for the nNMF will be available upon request.
SUBMITTER: Chalise P
PROVIDER: S-EPMC7078030 | biostudies-literature | 2020 Mar
REPOSITORIES: biostudies-literature
ACCESS DATA