Project description:Generating molecules with desired properties is an important task in chemistry and pharmacy. An efficient method may have a positive impact on finding drugs to treat diseases like COVID-19. Data mining and artificial intelligence may be good ways to find an efficient method. Recently, both the generative models based on deep learning and the work based on genetic algorithms have made some progress in generating molecules and optimizing the molecule's properties. However, existing methods need to be improved in efficiency and performance. To solve these problems, we propose a method named the Chemical Genetic Algorithm for Large Molecular Space (CALM). Specifically, CALM employs a scalable and efficient molecular representation called molecular matrix. Then, we design corresponding crossover, mutation, and mask operators inspired by domain knowledge and previous studies. We apply our genetic algorithm to several tasks related to molecular property optimization and constraint molecular optimization. The results of these tasks show that our approach outperforms the other state-of-the-art deep learning and genetic algorithm methods, where the z tests performed on the results of several experiments show that our method is more than 99% likely to be significant. At the same time, based on the experimental results, we point out the insufficiency in the experimental evaluation standard which affects the fair evaluation of previous work.Supplementary informationThe online version contains supplementary material available at 10.1007/s11390-021-0970-3.
Project description:Natural products are a diverse class of compounds with promising biological properties, such as high potency and excellent selectivity. However, they have different structural motifs than typical drug-like compounds, e.g., a wider range of molecular weight, multiple stereocenters and higher fraction of sp3-hybridized carbons. This makes the encoding of natural products via molecular fingerprints difficult, thus restricting their use in cheminformatics studies. To tackle this issue, we explored over 30 years of research to systematically evaluate which molecular fingerprint provides the best performance on the natural product chemical space. We considered 20 molecular fingerprints from four different sources, which we then benchmarked on over 100,000 unique natural products from the COCONUT (COlleCtion of Open Natural prodUcTs) and CMNPD (Comprehensive Marine Natural Products Database) databases. Our analysis focused on the correlation between different fingerprints and their classification performance on 12 bioactivity prediction datasets. Our results show that different encodings can provide fundamentally different views of the natural product chemical space, leading to substantial differences in pairwise similarity and performance. While Extended Connectivity Fingerprints are the de-facto option to encoding drug-like compounds, other fingerprints resulted to match or outperform them for bioactivity prediction of natural products. These results highlight the need to evaluate multiple fingerprinting algorithms for optimal performance and suggest new areas of research. Finally, we provide an open-source Python package for computing all molecular fingerprints considered in the study, as well as data and scripts necessary to reproduce the results, at https://github.com/dahvida/NP_Fingerprints .
Project description:We define bipartite and monopartite relational networks of chemical elements and compounds using two different datasets of inorganic chemical and material compounds, as well as study their topology. We discover that the connectivity between elements and compounds is distributed exponentially for materials, and with a fat tail for chemicals. Compounds networks show similar distribution of degrees, and feature a highly-connected club due to oxygen . Chemical compounds networks appear more modular than material ones, while the communities detected reveal different dominant elements specific to the topology. We successfully reproduce the connectivity of the empirical chemicals and materials networks by using a family of fitness models, where the fitness values are derived from the abundances of the elements in the aggregate compound data. Our results pave the way towards a relational network-based understanding of the inherent complexity of the vast chemical knowledge atlas, and our methodology can be applied to other systems with the ingredient-composite structure.
Project description:Herein we review our recent efforts in searching for bioactive ligands by enumeration and virtual screening of the unknown chemical space of small molecules. Enumeration from first principles shows that almost all small molecules (>99.9%) have never been synthesized and are still available to be prepared and tested. We discuss open access sources of molecules, the classification and representation of chemical space using molecular quantum numbers (MQN), its exhaustive enumeration in form of the chemical universe generated databases (GDB), and examples of using these databases for prospective drug discovery. MQN-searchable GDB, PubChem, and DrugBank are freely accessible at www.gdb.unibe.ch.
Project description:Propolis is a natural resinous material produced by bees and has been used in folk medicines since ancient times. Due to it possessing a broad spectrum of biological activities, it has gained significant scientific and commercial interest over the last two decades. As a result of searching 122 publications reported up to the end of 2019, we assembled a unique compound database consisting of 578 components isolated from both honey bee propolis and stingless bee propolis, and analyzed the chemical space and chemical diversity of these compounds. The results demonstrated that both honey bee propolis and stingless bee propolis are valuable sources for pharmaceutical and nutraceutical development.
Project description:The fight against the emergence of mutant influenza strains has led to the screening of an increasing number of compounds for inhibitory activity against influenza neuraminidase. This study explores the chemical space of neuraminidase inhibitors (NAIs), which provides an opportunity to obtain further molecular insights regarding the underlying basis of their bioactivity. In particular, a large set of 347 and 175 NAIs against influenza A and B, respectively, was compiled from the literature. Molecular and quantum chemical descriptors were obtained from low-energy conformational structures geometrically optimized at the PM6 level. The bioactivities of NAIs were classified as active or inactive according to their half maximum inhibitory concentration (IC50) value in which IC50 < 1µM and ≥ 10µM were defined as active and inactive compounds, respectively. Interpretable decision rules were derived from a quantitative structure-activity relationship (QSAR) model established using a set of substructure descriptors via decision tree analysis. Univariate analysis, feature importance analysis from decision tree modeling and molecular scaffold analysis were performed on both data sets for discriminating important structural features amongst active and inactive NAIs. Good predictive performance was achieved as deduced from accuracy and Matthews correlation coefficient values in excess of 81% and 0.58, respectively, for both influenza A and B NAIs. Furthermore, molecular docking was employed to investigate the binding modes and their moiety preferences of active NAIs against both influenza A and B neuraminidases. Moreover, novel NAIs with robust binding fitness towards influenza A and B neuraminidase were generated via combinatorial library enumeration and their binding fitness was on par or better than FDA-approved drugs. The results from this study are anticipated to be beneficial for guiding the rational drug design of novel NAIs for treating influenza infections.
Project description:Because of advances in the high-throughput screening technology, identification of a hit that can bind to a target protein has become a relatively easy task; however, in the process of drug discovery, the following hit-to-lead and lead optimization still remain challenging. In a typical hit-to-lead and lead optimization process, the analogues of the most promising hits are synthesized for the development of structure-activity relationship (SAR) analysis, and in turn, in the effort of optimization of lead compounds, such analysis provides guidance for the further synthesis. The synthesis processes are usually long and labor-intensive. In silico searching has becoming an alternative approach to explore SAR especially with millions of compounds ready to be screened and most of them can be easily obtained. Here, we report our discovery of 15 new Dishevelled PDZ domain inhibitors by using such an approach. In our studies, we first developed a pharmacophore model based on NSC668036, an inhibitor previously identified in our laboratory; based on the model, we then screened the ChemDiv database by using an algorithm that combines similarity search and docking procedures; finally, we selected potent inhibitors based on docking analysis and examined them by using NMR spectroscopy. NMR experiments showed that all the 15 compounds we chose bound to the PDZ domain tighter than NSC668036.
Project description:The chemical profiles of kawakawa (Piper excelsum) leaves were analysed through targeted and non-targeted LC-MS/MS. The phytochemical profile was obtained for both aqueous extracts representative of kawakawa tea and methanolic extracts. Sixty-four compounds were identified from eight leaf sources including phenylpropanoids, lignans, flavonoids, alkaloids and amides. Eight of these compounds were absolutely quantified. The chemical content varied significantly by leaf source, with two commercially available sources of dried kawakawa leaves being relatively high in phenylpropanoids and flavonoids compared with field-collected fresh samples that were richer in amides, alkaloids and lignans. The concentrations of pharmacologically active metabolites ingested from the traditional consumption of kawakawa leaf as an aqueous infusion, or from novel use as a seasoning, are well below documented toxicity thresholds.
Project description:Visualization of the chemical space is useful in many aspects of chemistry, including compound library design, diversity analysis, and exploring structure-property relationships, to name a few. Examples of notable research areas where the visualization of chemical space has strong applications are drug discovery and natural product research. However, the sheer volume of even comparatively small sub-sections of chemical space implies that we need to use approximations at the time of navigating through chemical space. ChemMaps is a visualization methodology that approximates the distribution of compounds in large datasets based on the selection of satellite compounds that yield a similar mapping of the whole dataset when principal component analysis on a similarity matrix is performed. Here, we show how the recently proposed extended similarity indices can help find regions that are relevant to sample satellites and reduce the amount of high-dimensional data needed to describe a library's chemical space.
Project description:Analyzing chemical datasets is a challenging task for scientific researchers in the field of chemoinformatics. It is important, yet difficult to understand the relationship between the structure of chemical compounds, their physico-chemical properties, and biological or toxic effects. To that respect, visualization tools can help to better comprehend the underlying correlations. Our recently developed 3D molecular viewer CheS-Mapper (Chemical Space Mapper) divides large datasets into clusters of similar compounds and consequently arranges them in 3D space, such that their spatial proximity reflects their similarity. The user can indirectly determine similarity, by selecting which features to employ in the process. The tool can use and calculate different kind of features, like structural fragments as well as quantitative chemical descriptors. These features can be highlighted within CheS-Mapper, which aids the chemist to better understand patterns and regularities and relate the observations to established scientific knowledge. As a final function, the tool can also be used to select and export specific subsets of a given dataset for further analysis.