Unknown

Dataset Information

0

Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping.


ABSTRACT: Sequence libraries that cover all k-mers enable universal, unbiased measurements of binding to both oligonucleotides and peptides. While the number of k-mers grows exponentially in k, space on all experimental platforms is limited. Here, we shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet simultaneously. We present the JokerCAKE (joker covering all k-mers) algorithm for generating a short sequence such that each k-mer appears at least p times with at most one joker character per k-mer. By running our algorithm on a range of parameters and alphabets, we show that JokerCAKE produces near-optimal sequences. Moreover, through comparison with data from hundreds of DNA-protein binding experiments and with new experimental results for both standard and JokerCAKE libraries, we establish that accurate binding scores can be inferred for high-affinity k-mers using JokerCAKE libraries. JokerCAKE libraries allow researchers to search a significantly larger sequence space using the same number of experimental measurements and at the same cost.

SUBMITTER: Orenstein Y 

PROVIDER: S-EPMC5661997 | biostudies-literature | 2017 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping.

Orenstein Yaron Y   Puccinelli Robert R   Kim Ryan R   Fordyce Polly P   Berger Bonnie B  

Cell systems 20170901 3


Sequence libraries that cover all k-mers enable universal, unbiased measurements of binding to both oligonucleotides and peptides. While the number of k-mers grows exponentially in k, space on all experimental platforms is limited. Here, we shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet simultaneously. We present the JokerCAKE (joker covering all k-mers) algorithm for generating a short sequence such that each k-mer appears at least p times w  ...[more]

Similar Datasets

| S-EPMC5811841 | biostudies-other
2020-08-20 | GSE71700 | GEO
| S-EPMC2246154 | biostudies-literature
| S-EPMC8141684 | biostudies-literature
| S-EPMC1242346 | biostudies-literature
| S-EPMC8088326 | biostudies-literature
| S-EPMC6041316 | biostudies-literature
| S-EPMC6437963 | biostudies-literature
2023-07-18 | GSE223086 | GEO
| S-EPMC8218311 | biostudies-literature