Unknown

Dataset Information

0

Filtering Next-Generation Sequencing of the Ig Gene Repertoire Data Using Antibody Structural Information.


ABSTRACT: Next-generation sequencing of the Ig gene repertoire (Ig-seq) produces large volumes of information at the nucleotide sequence level. Such data have improved our understanding of immune systems across numerous species and have already been successfully applied in vaccine development and drug discovery. However, the high-throughput nature of Ig-seq means that it is afflicted by high error rates. This has led to the development of error-correction approaches. Computational error-correction methods use sequence information alone, primarily designating sequences as likely to be correct if they are observed frequently. In this work, we describe an orthogonal method for filtering Ig-seq data, which considers the structural viability of each sequence. A typical natural Ab structure requires the presence of a disulfide bridge within each of its variable chains to maintain the fold. Our Ab Sequence Selector (ABOSS) uses the presence/absence of this bridge as a way of both identifying structurally viable sequences and estimating the sequencing error rate. On simulated Ig-seq datasets, ABOSS is able to identify more than 99% of structurally viable sequences. Applying our method to six independent Ig-seq datasets (one mouse and five human), we show that our error calculations are in line with previous experimental and computational error estimates. We also show how ABOSS is able to identify structurally impossible sequences missed by other error-correction methods.

SUBMITTER: Kovaltsuk A 

PROVIDER: S-EPMC6485405 | biostudies-literature | 2018 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Filtering Next-Generation Sequencing of the Ig Gene Repertoire Data Using Antibody Structural Information.

Kovaltsuk Aleksandr A   Krawczyk Konrad K   Kelm Sebastian S   Snowden James J   Deane Charlotte M CM  

Journal of immunology (Baltimore, Md. : 1950) 20181105 12


Next-generation sequencing of the Ig gene repertoire (Ig-seq) produces large volumes of information at the nucleotide sequence level. Such data have improved our understanding of immune systems across numerous species and have already been successfully applied in vaccine development and drug discovery. However, the high-throughput nature of Ig-seq means that it is afflicted by high error rates. This has led to the development of error-correction approaches. Computational error-correction methods  ...[more]

Similar Datasets

| S-EPMC4011907 | biostudies-other
| S-EPMC7710313 | biostudies-literature
| S-EPMC4969583 | biostudies-literature
| S-EPMC4840318 | biostudies-literature
| S-EPMC5413556 | biostudies-literature
| S-EPMC4476701 | biostudies-literature
| S-EPMC5037392 | biostudies-literature
| S-EPMC3493122 | biostudies-literature
| S-EPMC9891242 | biostudies-literature
| S-EPMC4760936 | biostudies-literature