Unknown

Dataset Information

0

Ccbmlib - a Python package for modeling Tanimoto similarity value distributions.


ABSTRACT: The ccbmlib Python package is a collection of modules for modeling similarity value distributions based on Tanimoto coefficients for fingerprints available in RDKit. It can be used to assess the statistical significance of Tanimoto coefficients and evaluate how molecular similarity is reflected when different fingerprint representations are used. Significance measures derived from p-values allow a quantitative comparison of similarity scores obtained from different fingerprint representations that might have very different value ranges. Furthermore, the package models conditional distributions of similarity coefficients for a given reference compound. The conditional significance score estimates where a test compound would be ranked in a similarity search. The models are based on the statistical analysis of feature distributions and feature correlations of fingerprints of a reference database. The resulting models have been evaluated for 11 RDKit fingerprints, taking a collection of ChEMBL compounds as a reference data set. For most fingerprints, highly accurate models were obtained, with differences of 1% or less for Tanimoto coefficients indicating high similarity.

SUBMITTER: Vogt M 

PROVIDER: S-EPMC7050271 | biostudies-literature | 2020

REPOSITORIES: biostudies-literature

altmetric image

Publications

ccbmlib - a Python package for modeling Tanimoto similarity value distributions.

Vogt Martin M   Bajorath Jürgen J  

F1000Research 20200210


The ccbmlib Python package is a collection of modules for modeling similarity value distributions based on Tanimoto coefficients for fingerprints available in RDKit. It can be used to assess the statistical significance of Tanimoto coefficients and evaluate how molecular similarity is reflected when different fingerprint representations are used. Significance measures derived from <i>p</i>-values allow a quantitative comparison of similarity scores obtained from different fingerprint representat  ...[more]

Similar Datasets

| S-EPMC3906378 | biostudies-literature
| S-EPMC9251768 | biostudies-literature
| S-EPMC4734043 | biostudies-literature
| S-EPMC4837986 | biostudies-literature
| S-EPMC7597035 | biostudies-literature
| S-EPMC8138882 | biostudies-literature
| S-EPMC8168212 | biostudies-literature
| S-EPMC8275978 | biostudies-literature
| S-EPMC5022704 | biostudies-literature
| S-EPMC6454532 | biostudies-literature