Dataset Information

Updated benchmarking of variant effect predictors using deep mutational scanning.

ABSTRACT: The assessment of variant effect predictor (VEP) performance is fraught with biases introduced by benchmarking against clinical observations. In this study, building on our previous work, we use independently generated measurements of protein function from deep mutational scanning (DMS) experiments for 26 human proteins to benchmark 55 different VEPs, while introducing minimal data circularity. Many top-performing VEPs are unsupervised methods including EVE, DeepSequence and ESM-1v, a protein language model that ranked first overall. However, the strong performance of recent supervised VEPs, in particular VARITY, shows that developers are taking data circularity and bias issues seriously. We also assess the performance of DMS and unsupervised VEPs for discriminating between known pathogenic and putatively benign missense variants. Our findings are mixed, demonstrating that some DMS datasets perform exceptionally at variant classification, while others are poor. Notably, we observe a striking correlation between VEP agreement with DMS data and performance in identifying clinically relevant variants, strongly supporting the validity of our rankings and the utility of DMS for independent benchmarking.

SUBMITTER: Livesey BJ

PROVIDER: S-EPMC10407742 | biostudies-literature | 2023 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Updated benchmarking of variant effect predictors using deep mutational scanning.

Livesey Benjamin J BJ Marsh Joseph A JA

Molecular systems biology 20230613 8

The assessment of variant effect predictor (VEP) performance is fraught with biases introduced by benchmarking against clinical observations. In this study, building on our previous work, we use independently generated measurements of protein function from deep mutational scanning (DMS) experiments for 26 human proteins to benchmark 55 different VEPs, while introducing minimal data circularity. Many top-performing VEPs are unsupervised methods including EVE, DeepSequence and ESM-1v, a protein la ...[more]

PMID: 37310135

Similar Datasets

Project description:SLCO1B1 (solute carrier organic anion transporter family member 1B1) is an important transmembrane hepatic uptake transporter. Genetic variants in the SLCO1B1 gene have been associated with altered protein folding, resulting in protein degradation and decreased transporter activity. Next-generation sequencing (NGS) of pharmacogenes is being applied increasingly to associate variation in drug response with genetic sequence variants. However, it is difficult to link variants of unknown significance with functional phenotypes using "one-at-a-time" functional systems. Deep mutational scanning (DMS) using a "landing pad cell-based system" is a high-throughput technique designed to analyze hundreds of gene open reading frame (ORF) missense variants in a parallel and scalable fashion. We have applied DMS to analyze 137 missense variants in the SLCO1B1 ORF obtained from the Exome Aggregation Consortium project. ORFs containing these variants were fused to green fluorescent protein and were integrated into "landing pad" cells. Florescence-activated cell sorting was performed to separate the cells into four groups based on fluorescence readout indicating protein expression at the single cell level. NGS was then performed and SLCO1B1 variant frequencies were used to determine protein abundance. We found that six variants not previously characterized functionally displayed less than 25% and another 12 displayed approximately 50% of wild-type protein expression. These results were then functionally validated by transporter studies. Severely damaging variants identified by DMS may have clinical relevance for SLCO1B1-dependent drug transport, but we need to exercise caution since the relatively small number of severely damaging variants identified raise questions with regard to the application of DMS to intrinsic membrane proteins such as organic anion transporter protein 1B1. SIGNIFICANCE STATEMENT: The functional implications of a large numbers of open reading frame (ORF) "variants of unknown significance" (VUS) in transporter genes have not been characterized. This study applied deep mutational scanning to determine the functional effects of VUS that have been observed in the ORF of SLCO1B1(s olute carrier organic anion transporter family member 1B1). Several severely damaging variants were identified, studied, and validated. These observations have implications for both the application of deep mutational scanning to intrinsic membrane proteins and for the clinical effect of drugs and endogenous compounds transported by SLCO1B1.

Dataset Information

Updated benchmarking of variant effect predictors using deep mutational scanning.

Publications

Updated benchmarking of variant effect predictors using deep mutational scanning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets