Unknown

Dataset Information

0

Interpreting Neural Networks for Biological Sequences by Learning Stochastic Masks.


ABSTRACT: Sequence-based neural networks can learn to make accurate predictions from large biological datasets, but model interpretation remains challenging. Many existing feature attribution methods are optimized for continuous rather than discrete input patterns and assess individual feature importance in isolation, making them ill-suited for interpreting non-linear interactions in molecular sequences. Building on work in computer vision and natural language processing, we developed an approach based on deep learning - Scrambler networks - wherein the most salient sequence positions are identified with learned input masks. Scramblers learn to predict Position-Specific Scoring Matrices (PSSMs) where unimportant nucleotides or residues are scrambled by raising their entropy. We apply Scramblers to interpret the effects of genetic variants, uncover non-linear interactions between cis-regulatory elements, explain binding specificity for protein-protein interactions, and identify structural determinants of de novo designed proteins. We show that Scramblers enable efficient attribution across large datasets and result in high-quality explanations, often outperforming state-of-the-art methods.

SUBMITTER: Linder J 

PROVIDER: S-EPMC9373874 | biostudies-literature | 2022 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Interpreting Neural Networks for Biological Sequences by Learning Stochastic Masks.

Linder Johannes J   La Fleur Alyssa A   Chen Zibo Z   Ljubeti Ajasja A   Baker David D   Kannan Sreeram S   Seelig Georg G  

Nature machine intelligence 20220125 1


Sequence-based neural networks can learn to make accurate predictions from large biological datasets, but model interpretation remains challenging. Many existing feature attribution methods are optimized for continuous rather than discrete input patterns and assess individual feature importance in isolation, making them ill-suited for interpreting non-linear interactions in molecular sequences. Building on work in computer vision and natural language processing, we developed an approach based on  ...[more]

Similar Datasets

| S-EPMC6129303 | biostudies-literature
| S-EPMC8328518 | biostudies-literature
| S-EPMC6227809 | biostudies-literature
| S-EPMC7505196 | biostudies-literature
| S-EPMC5946947 | biostudies-literature
| S-EPMC11549864 | biostudies-literature
| S-EPMC7050519 | biostudies-literature
| S-EPMC9166289 | biostudies-literature
| S-EPMC7511202 | biostudies-literature
| S-EPMC7267841 | biostudies-literature