Ontology highlight
ABSTRACT: Motivation
Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis.Results
We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ∼2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ∼2.2-2.8 times faster than RAPSearch and is ∼185-261 times faster than BLASTX.Availability and implementation
The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/Contact
akiyama@cs.titech.ac.jpSupplementary information
Supplementary data are available at Bioinformatics online.
SUBMITTER: Suzuki S
PROVIDER: S-EPMC4393512 | biostudies-literature | 2015 Apr
REPOSITORIES: biostudies-literature
Suzuki Shuji S Kakuta Masanori M Ishida Takashi T Akiyama Yutaka Y
Bioinformatics (Oxford, England) 20141127 8
<h4>Motivation</h4>Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis.<h4>Results</h4>We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a databas ...[more]