Project description:Macrocycles target proteins that are otherwise considered undruggable because of a lack of hydrophobic cavities and the presence of extended featureless surfaces. Increasing efforts by computational chemists have developed effective software to overcome the restrictions of torsional and conformational freedom that arise as a consequence of macrocyclization. Moloc is an efficient algorithm, with an emphasis on high interactivity, and has been constantly updated since 1986 by drug designers and crystallographers of the Roche biostructural community. In this work, we have benchmarked the shape-guided algorithm using a dataset of 208 macrocycles, carefully selected on the basis of structural complexity. We have quantified the accuracy, diversity, speed, exhaustiveness, and sampling efficiency in an automated fashion and we compared them with four commercial (Prime, MacroModel, molecular operating environment, and molecular dynamics) and four open-access (experimental-torsion distance geometry with additional "basic knowledge" alone and with Merck molecular force field minimization or universal force field minimization, Cambridge Crystallographic Data Centre conformer generator, and conformator) packages. With three-quarters of the database processed below the threshold of high ring accuracy, Moloc was identified as having the highest sampling efficiency and exhaustiveness without producing thousands of conformations, random ring splitting into two half-loops, and possibility to interactively produce globular or flat conformations with diversity similar to Prime, MacroModel, and molecular dynamics. The algorithm and the Python scripts for full automatization of these parameters are freely available for academic use.
Project description:The alkylation of some secondary amide functions with a dimethoxybenzyl (DMB) group in oligomers of 8-amino-2-quinolinecarboxylic acid destabilizes the otherwise favored helical conformations, and allows for cyclization to take place. A cyclic hexamer and a cyclic heptamer were produced in this manner. After DMB removal, X-ray crystallography and NMR show that the macrocycles adopt strained conformations that would be improbable in noncyclic species. The high helix folding propensity of the main chain is partly expressed in these conformations, but it remains frustrated by macrocyclization. Despite being homomeric, the macrocycles possess inequivalent monomer units. Experimental and computational studies highlight specific fluxional pathways within these structures. Extensive simulated annealing molecular dynamics allow for the prediction of the conformations for larger macrocycles with up to sixteen monomers.
Project description:BACKGROUND:Large-scale single-cell transcriptomic datasets generated using different technologies contain batch-specific systematic variations that present a challenge to batch-effect removal and data integration. With continued growth expected in scRNA-seq data, achieving effective batch integration with available computational resources is crucial. Here, we perform an in-depth benchmark study on available batch correction methods to determine the most suitable method for batch-effect removal. RESULTS:We compare 14 methods in terms of computational runtime, the ability to handle large datasets, and batch-effect correction efficacy while preserving cell type purity. Five scenarios are designed for the study: identical cell types with different technologies, non-identical cell types, multiple batches, big data, and simulated data. Performance is evaluated using four benchmarking metrics including kBET, LISI, ASW, and ARI. We also investigate the use of batch-corrected data to study differential gene expression. CONCLUSION:Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.
Project description:Artificial macrocycles recently became popular as a novel research field in drug discovery. As opposed to their natural twins, artificial macrocycles promise to have better control on synthesizability and control over their physicochemical properties resulting in druglike properties. Very few synthetic methods allow for the convergent, fast but diverse access to large macrocycles chemical space. One synthetic technology to access artificial macrocycles with potential biological activity, multicomponent reactions, is reviewed here, with a focus on our own work. We believe that synthetic chemists have to acquaint themselves more with structure and activity to leverage the design aspect of their daily work.
Project description:BackgroundThe high error rate of next generation sequencing (NGS) restricts some of its applications, such as monitoring virus mutations and detecting rare mutations in tumors. There are two commonly employed sequencing library preparation strategies to improve sequencing accuracy by correcting sequencing errors: read-pairing method and tag-clustering method (i.e. primer ID or UID). Here, we constructed a homogeneous library from a single clone, and compared the variant calling accuracy of these error-correction methods.ResultWe comprehensively described the strengths and pitfalls of these methods. We found that both read-pairing and tag-clustering methods significantly decreased sequencing error rate. While the read-pairing method was more effective than the tag-clustering method at correcting insertion and deletion errors, it was not as effective as the tag-clustering method at correcting substitution errors. In addition, we observed that when the read quality was poor, the tag-clustering method led to huge coverage loss. We also tested the effect of applying quality score filtering to the error-correction methods and demonstrated that quality score filtering was able to impose a minor, yet statistically significant improvement to the error-correction methods tested in this study.ConclusionOur study provides a benchmark for researchers to select suitable error-correction methods based on the goal of the experiment by balancing the trade-off between sequencing cost (i.e. sequencing coverage requirement) and detection sensitivity.
Project description:The potential utility of synthetic macrocycles (MCs) as drugs, particularly against low-druggability targets such as protein-protein interactions, has been widely discussed. There is little information, however, to guide the design of MCs for good target protein-binding activity or bioavailability. To address this knowledge gap, we analyze the binding modes of a representative set of MC-protein complexes. The results, combined with consideration of the physicochemical properties of approved macrocyclic drugs, allow us to propose specific guidelines for the design of synthetic MC libraries with structural and physicochemical features likely to favor strong binding to protein targets as well as good bioavailability. We additionally provide evidence that large, natural product-derived MCs can bind targets that are not druggable by conventional, drug-like compounds, supporting the notion that natural product-inspired synthetic MCs can expand the number of proteins that are druggable by synthetic small molecules.
Project description:Strain has a unique and sometimes unpredictable impact on the properties and reactivity of molecules. To thoroughly describe strain in molecules, a computational tool that relates strain energy to reactivity by localizing and quantifying strain was developed. Strain energy is calculated local to every coordinate in the molecule and areas of higher strain are shown experimentally to be more reactive. Not only does this tool directly compare strain energy in parts of the same molecule, but it also computes total strain to give a full picture of molecular strain energy. It is freely available to the public on GitHub under the name StrainViz and much of the workflow is automated to simplify use for non-experts. Unique insight into the reactivity of curved aromatic molecules and strained alkyne bioorthogonal reagents is described within.