Ontology highlight
ABSTRACT: Background
Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specificity of such computation will have broad impacts as it allows scientists to more completely explore the wealth of scientific data.Results
The Blazing Signature Filter (BSF) is a highly efficient pairwise similarity algorithm which enables extensive data mining within a reasonable amount of time. The algorithm transforms datasets into binary metrics, allowing it to utilize the computationally efficient bit operators and provide a coarse measure of similarity. We demonstrate the utility of our algorithm using two common bioinformatics tasks: identifying data sets with similar gene expression profiles, and comparing annotated genomes.Conclusions
The BSF is a highly efficient pairwise similarity algorithm that can scale to billions of comparisons without the need for specialized hardware.
SUBMITTER: Lee JY
PROVIDER: S-EPMC6047367 | biostudies-literature | 2018 Jun
REPOSITORIES: biostudies-literature
Lee Joon-Yong JY Fujimoto Grant M GM Wilson Ryan R Wiley H Steven HS Payne Samuel H SH
BMC bioinformatics 20180611 1
<h4>Background</h4>Identifying similarities between datasets is a fundamental task in data mining and has become an integral part of modern scientific investigation. Whether the task is to identify co-expressed genes in large-scale expression surveys or to predict combinations of gene knockouts which would elicit a similar phenotype, the underlying computational task is often a multi-dimensional similarity test. As datasets continue to grow, improvements to the efficiency, sensitivity or specifi ...[more]