Unknown

Dataset Information

0

HIA: a genome mapper using hybrid index-based sequence alignment.


ABSTRACT: BACKGROUND:A number of alignment tools have been developed to align sequencing reads to the human reference genome. The scale of information from next-generation sequencing (NGS) experiments, however, is increasing rapidly. Recent studies based on NGS technology have routinely produced exome or whole-genome sequences from several hundreds or thousands of samples. To accommodate the increasing need of analyzing very large NGS data sets, it is necessary to develop faster, more sensitive and accurate mapping tools. RESULTS:HIA uses two indices, a hash table index and a suffix array index. The hash table performs direct lookup of a q-gram, and the suffix array performs very fast lookup of variable-length strings by exploiting binary search. We observed that combining hash table and suffix array (hybrid index) is much faster than the suffix array method for finding a substring in the reference sequence. Here, we defined the matching region (MR) is a longest common substring between a reference and a read. And, we also defined the candidate alignment regions (CARs) as a list of MRs that is close to each other. The hybrid index is used to find candidate alignment regions (CARs) between a reference and a read. We found that aligning only the unmatched regions in the CAR is much faster than aligning the whole CAR. In benchmark analysis, HIA outperformed in mapping speed compared with the other aligners, without significant loss of mapping accuracy. CONCLUSIONS:Our experiments show that the hybrid of hash table and suffix array is useful in terms of speed for mapping NGS sequencing reads to the human reference genome sequence. In conclusion, our tool is appropriate for aligning massive data sets generated by NGS sequencing.

SUBMITTER: Choi J 

PROVIDER: S-EPMC4688996 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

HIA: a genome mapper using hybrid index-based sequence alignment.

Choi Jongpill J   Park Kiejung K   Cho Seong Beom SB   Chung Myungguen M  

Algorithms for molecular biology : AMB 20151223


<h4>Background</h4>A number of alignment tools have been developed to align sequencing reads to the human reference genome. The scale of information from next-generation sequencing (NGS) experiments, however, is increasing rapidly. Recent studies based on NGS technology have routinely produced exome or whole-genome sequences from several hundreds or thousands of samples. To accommodate the increasing need of analyzing very large NGS data sets, it is necessary to develop faster, more sensitive an  ...[more]

Similar Datasets

| S-EPMC6902276 | biostudies-literature
| S-EPMC419454 | biostudies-literature
| S-EPMC3431198 | biostudies-literature
| S-EPMC4410667 | biostudies-literature
2011-08-03 | GSE26248 | GEO
| S-EPMC4133627 | biostudies-literature
2024-10-10 | PXD050548 | Pride
| S-EPMC1579236 | biostudies-literature
| S-EPMC8289385 | biostudies-literature
2011-08-03 | E-GEOD-26248 | biostudies-arrayexpress