Unknown

Dataset Information

0

LARGE-SCALE MULTIPLE INFERENCE OF COLLECTIVE DEPENDENCE WITH APPLICATIONS TO PROTEIN FUNCTION.


ABSTRACT: Measuring the dependence of k ≥ 3 random variables and drawing inference from such higher-order dependences are scientifically important yet challenging. Motivated here by protein coevolution with multivariate categorical features, we consider an information theoretic measure of higher-order dependence. The proposed collective dependence is a symmetrization of differential interaction information which generalizes the mutual information of a pair of random variables. We show that the collective dependence can be easily estimated and facilitates a test on the dependence of k ≥ 3 random variables. Upon carefully exploring the null space of collective dependence, we devise a Classification-Assisted Large scaLe inference procedure to DEtect significant k-COllective DEpendence among dk random variables, with the false discovery rate controlled. Finite sample performance of our method is examined via simulations. We apply this method to the multiple protein sequence alignment data to study the residue or position coevolution for two protein families, the elongation factor P family and the zinc knuckle family. We identify novel functional triplets of amino acid residues, whose contributions to the protein function are further investigated. These confirm that the collective dependence does yield additional information important for understanding the protein coevolution compared to the pairwise measures.

SUBMITTER: Jernigan R 

PROVIDER: S-EPMC9337751 | biostudies-literature | 2021 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

LARGE-SCALE MULTIPLE INFERENCE OF COLLECTIVE DEPENDENCE WITH APPLICATIONS TO PROTEIN FUNCTION.

Jernigan Robert R   Jia Kejue K   Ren Zhao Z   Zhou Wen W  

The annals of applied statistics 20210601 2


Measuring the dependence of <i>k</i> ≥ 3 random variables and drawing inference from such higher-order dependences are scientifically important yet challenging. Motivated here by protein coevolution with multivariate categorical features, we consider an information theoretic measure of higher-order dependence. The proposed collective dependence is a symmetrization of differential interaction information which generalizes the mutual information of a pair of random variables. We show that the coll  ...[more]

Similar Datasets

| S-EPMC9912996 | biostudies-literature
| S-EPMC6221071 | biostudies-literature
| S-EPMC3068136 | biostudies-literature
| S-EPMC10540461 | biostudies-literature
| S-EPMC7394464 | biostudies-literature
| S-EPMC3584181 | biostudies-literature
| S-EPMC6460059 | biostudies-literature
| S-EPMC9120158 | biostudies-literature
| S-EPMC3400961 | biostudies-literature
| S-EPMC4729823 | biostudies-literature