Unknown

Dataset Information

0

MISHIMA--a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data.


ABSTRACT:

Background

Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences.

Results

We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences.

Conclusions

MISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species Helicobacter pylori (about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC.

SUBMITTER: Kryukov K 

PROVIDER: S-EPMC2848238 | biostudies-literature | 2010 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

MISHIMA--a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data.

Kryukov Kirill K   Saitou Naruya N  

BMC bioinformatics 20100318


<h4>Background</h4>Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences.<h4>Results</h  ...[more]

Similar Datasets

| S-EPMC7094160 | biostudies-literature
| S-EPMC2896173 | biostudies-literature
| S-EPMC4424971 | biostudies-literature
| S-EPMC6137996 | biostudies-literature
| S-EPMC2647288 | biostudies-literature
| S-EPMC10809904 | biostudies-literature
| S-EPMC525693 | biostudies-literature
| S-EPMC8998981 | biostudies-literature
| S-EPMC9602327 | biostudies-literature
| S-EPMC2923138 | biostudies-literature