Ontology highlight
ABSTRACT:
SUBMITTER: Yan C
PROVIDER: S-EPMC5344375 | biostudies-other | 2017
REPOSITORIES: biostudies-other
Yan Cairong C Zhao Xue X Zhang Qinglong Q Huang Yongfeng Y
PloS one 20170309 3
In big data area a significant challenge about string similarity join is to find all similar pairs more efficiently. In this paper, we propose a parallel processing framework for efficient string similarity join. First, the input is split into some disjoint small subsets according to the joint frequency distribution and the interval distribution of strings. Then the filter-verification strategy is adopted in the computation of string similarity for each subset so that the number of candidate pai ...[more]