Unknown

Dataset Information

0

An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs.


ABSTRACT: Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users' queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH) methods and evaluate four variants in a distributed computing environment (specifically, Hadoop). We identify several optimizations which improve performance, suitable for deployment in very large scale settings. The experimental results demonstrate our variants of LSH achieve the robust performance with better recall compared with "vanilla" LSH, even when using the same amount of space.

SUBMITTER: Cormode G 

PROVIDER: S-EPMC5773183 | biostudies-literature | 2018

REPOSITORIES: biostudies-literature

altmetric image

Publications

An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs.

Cormode Graham G   Dasgupta Anirban A   Goyal Amit A   Lee Chi Hoon CH  

PloS one 20180118 1


Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users' queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH  ...[more]

Similar Datasets

| S-EPMC8340999 | biostudies-literature
| S-EPMC6612865 | biostudies-other
| S-EPMC10538361 | biostudies-literature
| S-EPMC10985673 | biostudies-literature
| S-EPMC7669687 | biostudies-literature
| S-EPMC4393915 | biostudies-other
| S-EPMC9301846 | biostudies-literature
| S-EPMC2844998 | biostudies-literature
| S-EPMC7148242 | biostudies-literature
| S-EPMC4522652 | biostudies-literature