Unknown

Dataset Information

0

Efficient Test and Visualization of Multi-Set Intersections.


ABSTRACT: Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. However, so far no method has been developed to assess statistical significance of intersections among three or more sets. Moreover, the state-of-the-art approaches for visualization of multi-set intersections are not scalable. Here, we first developed a theoretical framework for computing the statistical distributions of multi-set intersections based upon combinatorial theory, and then accordingly designed a procedure to efficiently calculate the exact probabilities of multi-set intersections. We further developed multiple efficient and scalable techniques to visualize multi-set intersections and the corresponding intersection statistics. We implemented both the theoretical framework and the visualization techniques in a unified R software package, SuperExactTest. We demonstrated the utility of SuperExactTest through an intensive simulation study and a comprehensive analysis of seven independently curated cancer gene sets as well as six disease or trait associated gene sets identified by genome-wide association studies. We expect SuperExactTest developed by this study will have a broad range of applications in scientific data analysis in many disciplines.

SUBMITTER: Wang M 

PROVIDER: S-EPMC4658477 | biostudies-literature | 2015 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Efficient Test and Visualization of Multi-Set Intersections.

Wang Minghui M   Zhao Yongzhong Y   Zhang Bin B  

Scientific reports 20151125


Identification of sets of objects with shared features is a common operation in all disciplines. Analysis of intersections among multiple sets is fundamental for in-depth understanding of their complex relationships. However, so far no method has been developed to assess statistical significance of intersections among three or more sets. Moreover, the state-of-the-art approaches for visualization of multi-set intersections are not scalable. Here, we first developed a theoretical framework for co  ...[more]

Similar Datasets

2012-06-21 | GSE35191 | GEO
| S-EPMC3673214 | biostudies-other
| PRJNA150683 | ENA
| PRJEB50795 | ENA
| S-EPMC5714447 | biostudies-literature
2012-06-20 | E-GEOD-35191 | biostudies-arrayexpress
2008-06-21 | E-TABM-289 | biostudies-arrayexpress
2012-06-21 | GSE35186 | GEO
2012-06-21 | GSE35189 | GEO
| PRJEB31782 | ENA