Unknown

Dataset Information

0

Using alternative SMILES representations to identify novel functional analogues in chemical similarity vector searches.


ABSTRACT: Chemical similarity searches are a widely used family of in silico methods for identifying pharmaceutical leads. These methods historically relied on structure-based comparisons to compute similarity. Here, we use a chemical language model to create a vector-based chemical search. We extend previous implementations by creating a prompt engineering strategy that utilizes two different chemical string representation algorithms: one for the query and the other for the database. We explore this method by reviewing search results from nine queries with diverse targets. We find that the method identifies molecules with similar patent-derived functionality to the query, as determined by our validated LLM-assisted patent summarization pipeline. Further, many of these functionally similar molecules have different structures and scaffolds from the query, making them unlikely to be found with traditional chemical similarity searches. This method may serve as a new tool for the discovery of novel molecular structural classes that achieve target functionality.

SUBMITTER: Kosonocky CW 

PROVIDER: S-EPMC10724362 | biostudies-literature | 2023 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Using alternative SMILES representations to identify novel functional analogues in chemical similarity vector searches.

Kosonocky Clayton W CW   Feller Aaron L AL   Wilke Claus O CO   Ellington Andrew D AD  

Patterns (New York, N.Y.) 20231030 12


Chemical similarity searches are a widely used family of <i>in silico</i> methods for identifying pharmaceutical leads. These methods historically relied on structure-based comparisons to compute similarity. Here, we use a chemical language model to create a vector-based chemical search. We extend previous implementations by creating a prompt engineering strategy that utilizes two different chemical string representation algorithms: one for the query and the other for the database. We explore th  ...[more]

Similar Datasets

| S-EPMC4375400 | biostudies-literature
| S-EPMC2441795 | biostudies-literature
| S-EPMC2847166 | biostudies-literature
| S-EPMC10517535 | biostudies-literature
| S-EPMC1489924 | biostudies-literature
| S-EPMC3240574 | biostudies-literature
| S-EPMC11256111 | biostudies-literature
| S-EPMC2853128 | biostudies-literature
| S-EPMC5673929 | biostudies-literature
| S-EPMC6439793 | biostudies-other