Browse
Submit Data
Databases
API
Help

Dataset Information

28 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches.

ABSTRACT: The problem of finding the correct set of partners for a given pair of interacting protein families based on multi-sequence alignments (MSAs) has received great attention over the years. Recently, the native contacts of two interacting proteins were shown to store the strongest mutual information (MI) signal to discriminate MSA concatenations with the largest fraction of correct pairings. Although that signal might be of practical relevance in the search for an effective heuristic to solve the problem, the number of MSA concatenations with near-native MI is large, imposing severe limitations. Here, a Genetic Algorithm that explores possible MSA concatenations according to a MI maximization criteria is shown to find degenerate solutions with two error sources, arising from mismatches among (i) similar and (ii) non-similar sequences. If mistakes made among similar sequences are disregarded, type-(i) solutions are found to resolve correct pairings at best true positive (TP) rates of 70%-far above the very same estimates in type-(ii) solutions. A machine learning classification algorithm helps to show further that differences between optimized solutions based on TP rates are not artificial and may have biological meaning associated with the three-dimensional distribution of the MI signal. Type-(i) solutions may therefore correspond to reliable results for predictive purposes, found here to be more likely obtained via MI maximization across protein systems having a minimum critical number of amino acid contacts on their interaction surfaces (N > 200).

SUBMITTER: Pontes C

PROVIDER: S-EPMC7994710 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

Nontrivial quantum oscillation geometric phase shift in a trivial band.

Project description:Quantum oscillations provide a notable visualization of the Fermi surface of metals, including associated geometrical phases such as Berry's phase, that play a central role in topological quantum materials. Here we report the existence of a new quantum oscillation phase shift in a multiband system. In particular, we study the ABA-trilayer graphene, the band structure of which is composed of a weakly gapped linear Dirac band, nested within a quadratic band. We observe that Shubnikov-de Haas (SdH) oscillations of the quadratic band are shifted by a phase that sharply departs from the expected 2π Berry's phase and is inherited from the nontrivial Berry's phase of the linear band. We find this arises due to an unusual filling enforced constraint between the quadratic band and linear band Fermi surfaces. Our work indicates how additional bands can be exploited to tease out the effect of often subtle quantum mechanical geometric phases.

| S-EPMC6799982 | biostudies-literature

Inferring interaction partners from protein sequences using mutual information.

Project description:Functional protein-protein interactions are crucial in most cellular processes. They enable multi-protein complexes to assemble and to remain stable, and they allow signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interacting partners, and thus in correlations between their sequences. Pairwise maximum-entropy based models have enabled successful inference of pairs of amino-acid residues that are in contact in the three-dimensional structure of multi-protein complexes, starting from the correlations in the sequence data of known interaction partners. Recently, algorithms inspired by these methods have been developed to identify which proteins are functional interaction partners among the paralogous proteins of two families, starting from sequence data alone. Here, we demonstrate that a slightly higher performance for partner identification can be reached by an approximate maximization of the mutual information between the sequence alignments of the two protein families. Our mutual information-based method also provides signatures of the existence of interactions between protein families. These results stand in contrast with structure prediction of proteins and of multi-protein complexes from sequence data, where pairwise maximum-entropy based global statistical models substantially improve performance compared to mutual information. Our findings entail that the statistical dependences allowing interaction partner prediction from sequence data are not restricted to the residue pairs that are in direct contact at the interface between the partner proteins.

| S-EPMC6258550 | biostudies-literature

The Impact of Different Sources of Fluctuations on Mutual Information in Biochemical Networks.

Project description:Stochastic fluctuations in signaling and gene expression limit the ability of cells to sense the state of their environment, transfer this information along cellular pathways, and respond to it with high precision. Mutual information is now often used to quantify the fidelity with which information is transmitted along a cellular pathway. Mutual information calculations from experimental data have mostly generated low values, suggesting that cells might have relatively low signal transmission fidelity. In this work, we demonstrate that mutual information calculations might be artificially lowered by cell-to-cell variability in both initial conditions and slowly fluctuating global factors across the population. We carry out our analysis computationally using a simple signaling pathway and demonstrate that in the presence of slow global fluctuations, every cell might have its own high information transmission capacity but that population averaging underestimates this value. We also construct a simple synthetic transcriptional network and demonstrate using experimental measurements coupled to computational modeling that its operation is dominated by slow global variability, and hence that its mutual information is underestimated by a population averaged calculation.

| S-EPMC4615624 | biostudies-literature

Prediction of lncRNA-disease associations by integrating diverse heterogeneous information sources with RWR algorithm and positive pointwise mutual information.

Project description:BackgroundLong non-coding RNAs play an important role in human complex diseases. Identification of lncRNA-disease associations will gain insight into disease-related lncRNAs and benefit disease diagnoses and treatment. However, using experiments to explore the lncRNA-disease associations is expensive and time consuming.ResultsIn this study, we developed a novel method to identify potential lncRNA-disease associations by Integrating Diverse Heterogeneous Information sources with positive pointwise Mutual Information and Random Walk with restart algorithm (namely IDHI-MIRW). IDHI-MIRW first constructs multiple lncRNA similarity networks and disease similarity networks from diverse lncRNA-related and disease-related datasets, then implements the random walk with restart algorithm on these similarity networks for extracting the topological similarities which are fused with positive pointwise mutual information to build a large-scale lncRNA-disease heterogeneous network. Finally, IDHI-MIRW implemented random walk with restart algorithm on the lncRNA-disease heterogeneous network to infer potential lncRNA-disease associations.ConclusionsCompared with other state-of-the-art methods, IDHI-MIRW achieves the best prediction performance. In case studies of breast cancer, stomach cancer, and colorectal cancer, 36/45 (80%) novel lncRNA-disease associations predicted by IDHI-MIRW are supported by recent literatures. Furthermore, we found lncRNA LINC01816 is associated with the survival of colorectal cancer patients. IDHI-MIRW is freely available at https://github.com/NWPU-903PR/IDHI-MIRW .

| S-EPMC6381749 | biostudies-literature

Equitability, mutual information, and the maximal information coefficient.

Project description:How should one quantify the strength of association between two random variables without bias for relationships of a specific form? Despite its conceptual simplicity, this notion of statistical "equitability" has yet to receive a definitive mathematical formalization. Here we argue that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequality. Mutual information, a fundamental quantity in information theory, is shown to satisfy this equitability criterion. These findings are at odds with the recent work of Reshef et al. [Reshef DN, et al. (2011) Science 334(6062):1518-1524], which proposed an alternative definition of equitability and introduced a new statistic, the "maximal information coefficient" (MIC), said to satisfy equitability in contradistinction to mutual information. These conclusions, however, were supported only with limited simulation evidence, not with mathematical arguments. Upon revisiting these claims, we prove that the mathematical definition of equitability proposed by Reshef et al. cannot be satisfied by any (nontrivial) dependence measure. We also identify artifacts in the reported simulation evidence. When these artifacts are removed, estimates of mutual information are found to be more equitable than estimates of MIC. Mutual information is also observed to have consistently higher statistical power than MIC. We conclude that estimating mutual information provides a natural (and often practical) way to equitably quantify statistical associations in large datasets.

| S-EPMC3948249 | biostudies-literature

Mutual Information and Categorical Perception.

Project description:Categorical perception refers to the enhancement of perceptual sensitivity near category boundaries, generally along dimensions that are informative about category membership. However, it remains unclear exactly which dimensions are treated as informative and why. This article reports a series of experiments in which subjects were asked to learn statistically defined categories in a novel, unfamiliar 2D perceptual space of shapes. Perceptual discrimination was tested before and after category learning of various features in the space, each defined by its position and orientation relative to the maximally informative dimension. The results support a remarkably simple generalization: The magnitude of improvement in perceptual discrimination of each feature is proportional to the mutual information between the feature and the category variable. This finding suggests a rational basis for categorical perception in which the precision of perceptual discrimination is tuned to the statistical structure of the environment.

| S-EPMC8726593 | biostudies-literature

Pathway analysis through mutual information.

Project description:MotivationIn pathway analysis, we aim to establish a connection between the activity of a particular biological pathway and a difference in phenotype. There are many available methods to perform pathway analysis, many of them rely on an upstream differential expression analysis, and many model the relations between the abundances of the analytes in a pathway as linear relationships.ResultsHere, we propose a new method for pathway analysis, MIPath, that relies on information theoretical principles and, therefore, does not model the association between pathway activity and phenotype, resulting in relatively few assumptions. For this, we construct a graph of the data points for each pathway using a nearest-neighbor approach and score the association between the structure of this graph and the phenotype of these same samples using Mutual Information while adjusting for the effects of random chance in each score. The initial nearest neighbor approach evades individual gene-level comparisons, hence making the method scalable and less vulnerable to missing values. These properties make our method particularly useful for single-cell data. We benchmarked our method on several single-cell datasets, comparing it to established and new methods, and found that it produces robust, reproducible, and meaningful scores.Availability and implementationSource code is available at https://github.com/statisticalbiotechnology/mipath, or through Python Package Index as "mipathway."

| S-EPMC10783954 | biostudies-literature

Inferring network properties from time series using transfer entropy and mutual information: Validation of multivariate versus bivariate approaches.

Project description:Functional and effective networks inferred from time series are at the core of network neuroscience. Interpreting properties of these networks requires inferred network models to reflect key underlying structural features. However, even a few spurious links can severely distort network measures, posing a challenge for functional connectomes. We study the extent to which micro- and macroscopic properties of underlying networks can be inferred by algorithms based on mutual information and bivariate/multivariate transfer entropy. The validation is performed on two macaque connectomes and on synthetic networks with various topologies (regular lattice, small-world, random, scale-free, modular). Simulations are based on a neural mass model and on autoregressive dynamics (employing Gaussian estimators for direct comparison to functional connectivity and Granger causality). We find that multivariate transfer entropy captures key properties of all network structures for longer time series. Bivariate methods can achieve higher recall (sensitivity) for shorter time series but are unable to control false positives (lower specificity) as available data increases. This leads to overestimated clustering, small-world, and rich-club coefficients, underestimated shortest path lengths and hub centrality, and fattened degree distribution tails. Caution should therefore be used when interpreting network properties of functional connectomes obtained via correlation or pairwise statistical dependence measures, rather than more holistic (yet data-hungry) multivariate models.

| S-EPMC8233116 | biostudies-literature

Mutual information rate and bounds for it.

Project description:The amount of information exchanged per unit of time between two nodes in a dynamical network or between two data sets is a powerful concept for analysing complex systems. This quantity, known as the mutual information rate (MIR), is calculated from the mutual information, which is rigorously defined only for random systems. Moreover, the definition of mutual information is based on probabilities of significant events. This work offers a simple alternative way to calculate the MIR in dynamical (deterministic) networks or between two time series (not fully deterministic), and to calculate its upper and lower bounds without having to calculate probabilities, but rather in terms of well known and well defined quantities in dynamical systems. As possible applications of our bounds, we study the relationship between synchronisation and the exchange of information in a system of two coupled maps and in experimental networks of coupled oscillators.

| S-EPMC3480398 | biostudies-literature

Mutual information for testing gene-environment interaction.

Project description:Despite current enthusiasm for investigation of gene-gene interactions and gene-environment interactions, the essential issue of how to define and detect gene-environment interactions remains unresolved. In this report, we define gene-environment interactions as a stochastic dependence in the context of the effects of the genetic and environmental risk factors on the cause of phenotypic variation among individuals. We use mutual information that is widely used in communication and complex system analysis to measure gene-environment interactions. We investigate how gene-environment interactions generate the large difference in the information measure of gene-environment interactions between the general population and a diseased population, which motives us to develop mutual information-based statistics for testing gene-environment interactions. We validated the null distribution and calculated the type 1 error rates for the mutual information-based statistics to test gene-environment interactions using extensive simulation studies. We found that the new test statistics were more powerful than the traditional logistic regression under several disease models. Finally, in order to further evaluate the performance of our new method, we applied the mutual information-based statistics to three real examples. Our results showed that P-values for the mutual information-based statistics were much smaller than that obtained by other approaches including logistic regression models.

| S-EPMC2642626 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data