Dataset Information

Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies.

ABSTRACT: The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs or SNP-sets should be clustered. A dendrogram can then be constructed based on such distance measure, and the number of clusters can be determined. With the resulting SNP-sets, we next develop an association test HDAT to examine susceptibility to the disease of interest. This proposed test assesses, based on Hamming distance, whether the similarity between a diseased and a normal individual differs from the similarity between two individuals of the same disease status. In our proposed methodology, only genotype information is needed. No inference of haplotypes is required, and SNPs under consideration do not need to locate in nearby regions. The proposed clustering algorithm and association test are illustrated with applications and simulation studies. As compared with other existing methods, the clustering algorithm is faster and better at identifying sets containing SNPs exerting a similar effect. In addition, the simulation studies demonstrated that the proposed test works well for SNP-sets containing a large proportion of neutral SNPs. Furthermore, employing the clustering algorithm before testing a large set of data improves the knowledge in confining the genetic regions for susceptible genetic markers.

SUBMITTER: Wang C

PROVIDER: S-EPMC4547758 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies.

Wang Charlotte C Kao Wen-Hsin WH Hsiao Chuhsing Kate CK

PloS one 20150824 8

The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genoty ...[more]

PMID: 26302001

Dataset Information

Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies.

Publications

Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Genome-scale NCRNA homology search using a Hamming distance-based filtration strategy.
| S-EPMC3311100 | biostudies-literature

Gene-Based Nonparametric Testing of Interactions Using Distance Correlation Coefficient in Case-Control Association Studies.
| S-EPMC6316506 | biostudies-literature

Testing SNPs and sets of SNPs for importance in association studies.
| S-EPMC3006123 | biostudies-literature

Hamming Distance as a Concept in DNA Molecular Recognition.
| S-EPMC5410656 | biostudies-literature

Distance-based clustering challenges for unbiased benchmarking studies.
| S-EPMC8460803 | biostudies-literature

Evaluation of gene-expression clustering via mutual information distance measure.
| S-EPMC1858704 | biostudies-literature

The Generalized Higher Criticism for Testing SNP-Set Effects in Genetic Association Studies.
| S-EPMC5517103 | biostudies-literature

Simultaneous discovery and testing of deletions for disease association in SNP genotyping studies.
| S-EPMC2227920 | biostudies-literature

Implementation of a Hamming distance-like genomic quantum classifier using inner products on ibmqx2 and ibmq_16_melbourne.
| S-EPMC7446251 | biostudies-literature

SNPinfo: integrating GWAS and candidate gene information into functional SNP selection for genetic association studies.
| S-EPMC2703930 | biostudies-literature