Project description:Synthesis of a polymer composed of a large discrete number of chemically distinct monomers in an absolutely defined aperiodic sequence remains a challenge in polymer chemistry. The synthesis has largely been limited to oligomers having a limited number of repeating units due to the difficulties associated with the step-by-step addition of individual monomers to achieve high molecular weights. Here we report the copolymers of α-hydroxy acids, poly(phenyllactic-co-lactic acid) (PcL) built via the cross-convergent method from four dyads of monomers as constituent units. Our proposed method allows scalable synthesis of sequence-defined PcL in a minimal number of coupling steps from reagents in stoichiometric amounts. Digital information can be stored in an aperiodic sequence of PcL, which can be fully retrieved as binary code by mass spectrometry sequencing. The information storage density (bit/Da) of PcL is 50% higher than DNA, and the storage capacity of PcL can also be increased by adjusting the molecular weight (~38 kDa).
Project description:MOTIVATION:Repeat proteins, which contain multiple repeats of short sequence motifs, form a large but seldom-studied group of proteins. Methods focusing on the analysis of 3D structures of such proteins identified many subtle effects in length distribution of individual motifs that are important for their functions. However, similar analysis was yet not applied to the vast majority of repeat proteins with unknown 3D structures, mostly because of the extreme diversity of the underlying motifs and the resulting difficulty to detect those. RESULTS:We developed FAIT, a sequence-based algorithm for the precise assignment of individual repeats in repeat proteins and introduced a framework to classify and compare aperiodicity patterns for large protein families. FAIT extracts repeat positions by post-processing FFAS alignment matrices with image processing methods. On examples of proteins with Leucine Rich Repeat (LRR) domains and other solenoids like proteins, we show that the automated analysis with FAIT correctly identifies exact lengths of individual repeats based entirely on sequence information. AVAILABILITY AND IMPLEMENTATION:https://github.com/GodzikLab/FAIT CONTACT: adam@godziklab.org SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:Due to its longevity and enormous information density, DNA is an attractive medium for archival storage. The current hamstring of DNA data storage systems-both in cost and speed-is synthesis. The key idea for breaking this bottleneck pursued in this work is to move beyond the low-error and expensive synthesis employed almost exclusively in today's systems, towards cheaper, potentially faster, but high-error synthesis technologies. Here, we demonstrate a DNA storage system that relies on massively parallel light-directed synthesis, which is considerably cheaper than conventional solid-phase synthesis. However, this technology has a high sequence error rate when optimized for speed. We demonstrate that even in this high-error regime, reliable storage of information is possible, by developing a pipeline of algorithms for encoding and reconstruction of the information. In our experiments, we store a file containing sheet music of Mozart, and show perfect data recovery from low synthesis fidelity DNA.