Unknown

Dataset Information

0

Learning common and specific patterns from data of multiple interrelated biological scenarios with matrix factorization.


ABSTRACT: High-throughput biological technologies (e.g. ChIP-seq, RNA-seq and single-cell RNA-seq) rapidly accelerate the accumulation of genome-wide omics data in diverse interrelated biological scenarios (e.g. cells, tissues and conditions). Integration and differential analysis are two common paradigms for exploring and analyzing such data. However, current integrative methods usually ignore the differential part, and typical differential analysis methods either fail to identify combinatorial patterns of difference or require matched dimensions of the data. Here, we propose a flexible framework CSMF to combine them into one paradigm to simultaneously reveal Common and Specific patterns via Matrix Factorization from data generated under interrelated biological scenarios. We demonstrate the effectiveness of CSMF with four representative applications including pairwise ChIP-seq data describing the chromatin modification map between K562 and Huvec cell lines; pairwise RNA-seq data representing the expression profiles of two different cancers; RNA-seq data of three breast cancer subtypes; and single-cell RNA-seq data of human embryonic stem cell differentiation at six time points. Extensive analysis yields novel insights into hidden combinatorial patterns in these multi-modal data. Results demonstrate that CSMF is a powerful tool to uncover common and specific patterns with significant biological implications from data of interrelated biological scenarios.

SUBMITTER: Zhang L 

PROVIDER: S-EPMC6649783 | biostudies-literature | 2019 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Learning common and specific patterns from data of multiple interrelated biological scenarios with matrix factorization.

Zhang Lihua L   Zhang Shihua S  

Nucleic acids research 20190701 13


High-throughput biological technologies (e.g. ChIP-seq, RNA-seq and single-cell RNA-seq) rapidly accelerate the accumulation of genome-wide omics data in diverse interrelated biological scenarios (e.g. cells, tissues and conditions). Integration and differential analysis are two common paradigms for exploring and analyzing such data. However, current integrative methods usually ignore the differential part, and typical differential analysis methods either fail to identify combinatorial patterns  ...[more]

Similar Datasets

| S-EPMC6206826 | biostudies-literature
| S-EPMC7337516 | biostudies-literature
| S-EPMC6311903 | biostudies-other
| S-EPMC7332573 | biostudies-literature
| S-EPMC4894278 | biostudies-literature
| S-EPMC9130660 | biostudies-literature
| S-EPMC4411332 | biostudies-literature
| S-EPMC2800351 | biostudies-literature
| S-EPMC5098501 | biostudies-literature
| S-EPMC7488540 | biostudies-literature