Unknown

Dataset Information

0

Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components.


ABSTRACT: Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension, that implements additional methods, the most important being an algorithm that extends and automates a graphical Bayesian method. Using simulations, we compared the methods. Our newly automated procedure is competitive with the best methods when considering both accuracy and speed and is the most accurate when the number of objects is small compared with the number of attributes. We applied the method to a proteomics data set from patients with acute myeloid leukemia. Proteins in the apoptosis pathway could be explained using 6 PCs. By clustering the proteins in PC space, we were able to replace the PCs by 6 "biological components," 3 of which could be immediately interpreted from the current literature. We expect this approach combining PCA with clustering to be widely applicable.

SUBMITTER: Wang M 

PROVIDER: S-EPMC5987987 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

altmetric image

Publications

Decomposing the Apoptosis Pathway Into Biologically Interpretable Principal Components.

Wang Min M   Kornblau Steven M SM   Coombes Kevin R KR  

Cancer informatics 20180509


Principal component analysis (PCA) is one of the most common techniques in the analysis of biological data sets, but applying PCA raises 2 challenges. First, one must determine the number of significant principal components (PCs). Second, because each PC is a linear combination of genes, it rarely has a biological interpretation. Existing methods to determine the number of PCs are either subjective or computationally extensive. We review several methods and describe a new R package, PCDimension,  ...[more]

Similar Datasets

| S-EPMC10600308 | biostudies-literature
| S-EPMC3480088 | biostudies-literature
2013-03-14 | E-GEOD-26520 | biostudies-arrayexpress
| S-EPMC3623732 | biostudies-literature
2013-03-14 | GSE26520 | GEO
| S-EPMC6180590 | biostudies-literature
| S-EPMC8555838 | biostudies-literature
| S-EPMC2667477 | biostudies-literature
| S-EPMC3392282 | biostudies-literature
| S-EPMC7083277 | biostudies-literature