Dataset Information

Deciphering the language of antibodies using self-supervised learning.

ABSTRACT: An individual's B cell receptor (BCR) repertoire encodes information about past immune responses and potential for future disease protection. Deciphering the information stored in BCR sequence datasets will transform our understanding of disease and enable discovery of novel diagnostics and antibody therapeutics. A key challenge of BCR sequence analysis is the prediction of BCR properties from their amino acid sequence alone. Here, we present an antibody-specific language model, Antibody-specific Bidirectional Encoder Representation from Transformers (AntiBERTa), which provides a contextualized representation of BCR sequences. Following pre-training, we show that AntiBERTa embeddings capture biologically relevant information, generalizable to a range of applications. As a case study, we fine-tune AntiBERTa to predict paratope positions from an antibody sequence, outperforming public tools across multiple metrics. To our knowledge, AntiBERTa is the deepest protein-family-specific language model, providing a rich representation of BCRs. AntiBERTa embeddings are primed for multiple downstream tasks and can improve our understanding of the language of antibodies.

SUBMITTER: Leem J

PROVIDER: S-EPMC9278498 | biostudies-literature | 2022 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Deciphering the language of antibodies using self-supervised learning.

Leem Jinwoo J Mitchell Laura S LS Farmery James H R JHR Barton Justin J Galson Jacob D JD

Patterns (New York, N.Y.) 20220518 7

An individual's B cell receptor (BCR) repertoire encodes information about past immune responses and potential for future disease protection. Deciphering the information stored in BCR sequence datasets will transform our understanding of disease and enable discovery of novel diagnostics and antibody therapeutics. A key challenge of BCR sequence analysis is the prediction of BCR properties from their amino acid sequence alone. Here, we present an antibody-specific language model, Antibody-specifi ...[more]

PMID: 35845836

Similar Datasets

Project description:With a spatial resolution of tens of microns, ultrasound localization microscopy (ULM) reconstructs microvascular structures and measures intravascular flows by tracking microbubbles (1-5 μm) in contrast enhanced ultrasound (CEUS) images. Since the size of CEUS bubble traces, e.g. 0.5-1 mm for ultrasound with a wavelength λ = 280 μm, is typically two orders of magnitude larger than the bubble diameter, accurately localizing microbubbles in noisy CEUS data is vital to the fidelity of the ULM results. In this paper, we introduce a residual learning based supervised super-resolution blind deconvolution network (SupBD-net), and a new loss function for a self-supervised blind deconvolution network (SelfBD-net), for detecting bubble centers at a spatial resolution finer than λ/10. Our ultimate purpose is to improve the ability to distinguish closely located microvessels and the accuracy of the velocity profile measurements in macrovessels. Using realistic synthetic data, the performance of these methods is calibrated and compared against several recently introduced deep learning and blind deconvolution techniques. For bubble detection, errors in bubble center location increase with the trace size, noise level, and bubble concentration. For all cases, SupBD-net yields the least error, keeping it below 0.1 λ. For unknown bubble trace morphology, where all the supervised learning methods fail, SelfBD-net can still maintain an error of less than 0.15 λ. SupBD-net also outperforms the other methods in separating closely located bubbles and parallel microvessels. In macrovessels, SupBD-net maintains the least errors in the vessel radius and velocity profile after introducing a procedure that corrects for terminated tracks caused by overlapping traces. Application of these methods is demonstrated by mapping the cerebral microvasculature of a neonatal pig, where neighboring microvessels separated by 0.15 λ can be readily distinguished by SupBD-net and SelfBD-net, but not by the other techniques. Hence, the newly proposed residual learning based methods improve the spatial resolution and accuracy of ULM in micro- and macro-vessels.

Dataset Information

Deciphering the language of antibodies using self-supervised learning.

Publications

Deciphering the language of antibodies using self-supervised learning.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets