Unknown

Dataset Information

0

Faster sequence homology searches by clustering subsequences.


ABSTRACT: MOTIVATION: Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. RESULTS: We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a database to perform an efficient seed search and ungapped extension by reducing alignment candidates based on triangle inequality. The database subsequence clustering technique achieved an ?2-fold increase in speed without a large decrease in search sensitivity. When we measured with metagenomic data, GHOSTZ is ?2.2-2.8 times faster than RAPSearch and is ?185-261 times faster than BLASTX. AVAILABILITY AND IMPLEMENTATION: The source code is freely available for download at http://www.bi.cs.titech.ac.jp/ghostz/ CONTACT: akiyama@cs.titech.ac.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

SUBMITTER: Suzuki S 

PROVIDER: S-EPMC4393512 | biostudies-literature | 2015 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Faster sequence homology searches by clustering subsequences.

Suzuki Shuji S   Kakuta Masanori M   Ishida Takashi T   Akiyama Yutaka Y  

Bioinformatics (Oxford, England) 20141127 8


<h4>Motivation</h4>Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis.<h4>Results</h4>We developed a fast homology search method based on database subsequence clustering, and implemented it as GHOSTZ. This method clusters similar subsequences from a databas  ...[more]

Similar Datasets

| S-EPMC4970815 | biostudies-literature
| S-EPMC3120707 | biostudies-literature
| S-EPMC107726 | biostudies-literature
| S-EPMC4086066 | biostudies-literature
| S-EPMC7267824 | biostudies-literature
| S-EPMC10491838 | biostudies-literature
| S-EPMC208762 | biostudies-literature
| S-EPMC4177039 | biostudies-literature
| S-EPMC4568567 | biostudies-literature
| S-EPMC3403935 | biostudies-literature