Dataset Information

Perfect Hamming code with a hash table for faster genome mapping.

ABSTRACT:

Background

With the advent of next-generation sequencers, the growing demands to map short DNA sequences to a genome have promoted the development of fast algorithms and tools. The tools commonly used today are based on either a hash table or the suffix array/Burrow-Wheeler transform. These algorithms are the best suited to finding the genome position of exactly matching short reads. However, they have limited capacity to handle the mismatches. To find n-mismatches, they requires O(2n) times the computation time of exact matches. Therefore, acceleration techniques are required.

Results

We propose a hash-based method for genome mapping that reduces the number of hash references for finding mismatches without increasing the size of the hash table. The method regards DNA subsequences as words on Galois extension field GF(2²) and each word is encoded to a code word of a perfect Hamming code. The perfect Hamming code defines equivalence classes of DNA subsequences. Each equivalence class includes subsequence whose corresponding words on GF(2²) are encoded to a corresponding code word. The code word is used as a hash key to store these subsequences in a hash table. Specifically, it reduces by about 70% the number of hash keys necessary for searching the genome positions of all 2-mismatches of 21-base-long DNA subsequence.

Conclusions

The paper shows perfect hamming code can reduce the number of hash references for hash-based genome mapping. As the computation time to calculate code words is far shorter than a hash reference, our method is effective to reduce the computation time to map short DNA sequences to genome. The amount of data that DNA sequencers generate continues to increase and more accurate genome mappings are required. Thus our method will be a key technology to develop faster genome mapping software.

SUBMITTER: Takenaka Y

PROVIDER: S-EPMC3333191 | biostudies-literature | 2011 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Perfect Hamming code with a hash table for faster genome mapping.

Takenaka Yoichi Y Seno Shigeto S Matsuda Hideo H

BMC genomics 20111130

<h4>Background</h4>With the advent of next-generation sequencers, the growing demands to map short DNA sequences to a genome have promoted the development of fast algorithms and tools. The tools commonly used today are based on either a hash table or the suffix array/Burrow-Wheeler transform. These algorithms are the best suited to finding the genome position of exactly matching short reads. However, they have limited capacity to handle the mismatches. To find n-mismatches, they requires O(2n) t ...[more]

PMID: 22369457

Similar Datasets

Project description:The geometry of the characteristic element forming the artificial structure of an electromagnetic metamaterial defines the way the metamaterial will interact with electromagnetic waves, and accordingly, how it will transmit, reflect, and absorb electromagnetic energy. Metamaterials have been discovered that can manipulate electromagnetic waves to create perfect absorption of incident electromagnetic energy using relatively simple elemental geometries. But the phenomenon is confined to very narrow frequency bandwidths owing to the mono-resonance characteristics of simple cellular structures. Complex cellular geometries based on the combination of many different fundamental building blocks may be able to constructively couple many more resonances and broaden the perfect absorption bandwidth. We describe here a metasurface based upon geometric inversion of a set of conformal mapping contours. The resulting geometry forms a nearly continuous series of perfect absorption resonances within an ultrathin (λ/165) metasurface to develop broadband absorption in a frequency range of interest for downhole chemical spectroscopy. The metasurface is derived from a geometric inversion of the Rhodonea, or more commonly called four-leaf roses, conformal mapping contours and was found to exhibit a near zero index metamaterial (NZIM) behavior. An uncooled microbolometer design is described that uses the metasurface geometry on a single VO2 thermometric substrate leading to an infrared detector with predicted maximum absorption of 99.94% at 4.3 μm and an absorption bandwidth of 170% FWHM on 15.8 μm center wavelength, coincident with important chemical spectra of downhole hydrocarbons. The infrared detector design has a predicted maximum detectivity D* = 1.5 × 109 cm[Formula: see text]/W and noise equivalent difference temperature NEDT of 70 mK at a frame rate of 60 Hz. These levels of detector performance conventionally would be achievable only with cryogenically cooled technologies and could represent a significant step in the effort towards deploying an in situ infrared chemical spectroscopy sensor into downhole logging applications.

Dataset Information

Perfect Hamming code with a hash table for faster genome mapping.

Background

Results

Conclusions

Publications

Perfect Hamming code with a hash table for faster genome mapping.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets