Unknown

Dataset Information

0

Ligand biological activity predicted by cleaning positive and negative chemical correlations.


ABSTRACT: Predicting ligand biological activity is a key challenge in drug discovery. Ligand-based statistical approaches are often hampered by noise due to undersampling: The number of molecules known to be active or inactive is vastly less than the number of possible chemical features that might determine binding. We derive a statistical framework inspired by random matrix theory and combine the framework with high-quality negative data to discover important chemical differences between active and inactive molecules by disentangling undersampling noise. Our model outperforms standard benchmarks when tested against a set of challenging retrospective tests. We prospectively apply our model to the human muscarinic acetylcholine receptor M1, finding four experimentally confirmed agonists that are chemically dissimilar to all known ligands. The hit rate of our model is significantly higher than the state of the art. Our model can be interpreted and visualized to offer chemical insights about the molecular motifs that are synergistic or antagonistic to M1 agonism, which we have prospectively experimentally verified.

SUBMITTER: Lee AA 

PROVIDER: S-EPMC6397557 | biostudies-literature | 2019 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

Ligand biological activity predicted by cleaning positive and negative chemical correlations.

Lee Alpha A AA   Yang Qingyi Q   Bassyouni Asser A   Butler Christopher R CR   Hou Xinjun X   Jenkinson Stephen S   Price David A DA  

Proceedings of the National Academy of Sciences of the United States of America 20190211 9


Predicting ligand biological activity is a key challenge in drug discovery. Ligand-based statistical approaches are often hampered by noise due to undersampling: The number of molecules known to be active or inactive is vastly less than the number of possible chemical features that might determine binding. We derive a statistical framework inspired by random matrix theory and combine the framework with high-quality negative data to discover important chemical differences between active and inact  ...[more]

Similar Datasets

| S-EPMC5780839 | biostudies-literature
| S-EPMC7688242 | biostudies-literature
| S-EPMC2732283 | biostudies-literature
| S-EPMC2728800 | biostudies-literature
| PRJNA930680 | ENA
| S-EPMC20542 | biostudies-literature
| S-EPMC5994333 | biostudies-literature
| S-EPMC3036769 | biostudies-literature
| S-EPMC7444502 | biostudies-literature
| S-EPMC4797711 | biostudies-other