Unknown

Dataset Information

0

Anatomy of high-performance 2D similarity calculations.


ABSTRACT: Similarity measures based on the comparison of dense bit vectors of two-dimensional chemical features are a dominant method in chemical informatics. For large-scale problems, including compound selection and machine learning, computing the intersection between two dense bit vectors is the overwhelming bottleneck. We describe efficient implementations of this primitive as well as example applications using features of modern CPUs that allow 20-40× performance increases relative to typical code. Specifically, we describe fast methods for population count on modern x86 processors and cache-efficient matrix traversal and leader clustering algorithms that alleviate memory bandwidth bottlenecks in similarity matrix construction and clustering. The speed of our 2D comparison primitives is within a small factor of that obtained on GPUs and does not require specialized hardware.

SUBMITTER: Haque IS 

PROVIDER: S-EPMC4839782 | biostudies-literature | 2011 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Anatomy of high-performance 2D similarity calculations.

Haque Imran S IS   Pande Vijay S VS   Walters W Patrick WP  

Journal of chemical information and modeling 20110907 9


Similarity measures based on the comparison of dense bit vectors of two-dimensional chemical features are a dominant method in chemical informatics. For large-scale problems, including compound selection and machine learning, computing the intersection between two dense bit vectors is the overwhelming bottleneck. We describe efficient implementations of this primitive as well as example applications using features of modern CPUs that allow 20-40× performance increases relative to typical code. S  ...[more]

Similar Datasets

| S-EPMC8044177 | biostudies-literature
| S-EPMC5095783 | biostudies-literature
| S-EPMC11349043 | biostudies-literature
| S-EPMC9322642 | biostudies-literature
| S-EPMC5112518 | biostudies-literature
| S-EPMC5868182 | biostudies-literature
| S-EPMC6441005 | biostudies-literature
| S-EPMC7539184 | biostudies-literature
| S-EPMC6821766 | biostudies-literature
| S-EPMC5552882 | biostudies-other