Unknown

Dataset Information

0

Data-driven supervised learning of a viral protease specificity landscape from deep sequencing and molecular simulations.


ABSTRACT: Biophysical interactions between proteins and peptides are key determinants of molecular recognition specificity landscapes. However, an understanding of how molecular structure and residue-level energetics at protein-peptide interfaces shape these landscapes remains elusive. We combine information from yeast-based library screening, next-generation sequencing, and structure-based modeling in a supervised machine learning approach to report the comprehensive sequence-energetics-function mapping of the specificity landscape of the hepatitis C virus (HCV) NS3/4A protease, whose function-site-specific cleavages of the viral polyprotein-is a key determinant of viral fitness. We screened a library of substrates in which five residue positions were randomized and measured cleavability of ?30,000 substrates (?1% of the library) using yeast display and fluorescence-activated cell sorting followed by deep sequencing. Structure-based models of a subset of experimentally derived sequences were used in a supervised learning procedure to train a support vector machine to predict the cleavability of 3.2 million substrate variants by the HCV protease. The resulting landscape allows identification of previously unidentified HCV protease substrates, and graph-theoretic analyses reveal extensive clustering of cleavable and uncleavable motifs in sequence space. Specificity landscapes of known drug-resistant variants are similarly clustered. The described approach should enable the elucidation and redesign of specificity landscapes of a wide variety of proteases, including human-origin enzymes. Our results also suggest a possible role for residue-level energetics in shaping plateau-like functional landscapes predicted from viral quasispecies theory.

SUBMITTER: Pethe MA 

PROVIDER: S-EPMC6320525 | biostudies-literature | 2019 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Data-driven supervised learning of a viral protease specificity landscape from deep sequencing and molecular simulations.

Pethe Manasi A MA   Rubenstein Aliza B AB   Khare Sagar D SD  

Proceedings of the National Academy of Sciences of the United States of America 20181226 1


Biophysical interactions between proteins and peptides are key determinants of molecular recognition specificity landscapes. However, an understanding of how molecular structure and residue-level energetics at protein-peptide interfaces shape these landscapes remains elusive. We combine information from yeast-based library screening, next-generation sequencing, and structure-based modeling in a supervised machine learning approach to report the comprehensive sequence-energetics-function mapping  ...[more]

Similar Datasets

2019-11-13 | GSE140262 | GEO
| S-EPMC9922274 | biostudies-literature
| PRJNA589061 | ENA
| S-EPMC6550282 | biostudies-literature
| S-EPMC4345474 | biostudies-literature
| S-EPMC10076051 | biostudies-literature
| S-EPMC9235491 | biostudies-literature
| S-EPMC9307817 | biostudies-literature
| S-EPMC8506936 | biostudies-literature
| S-EPMC7551840 | biostudies-literature