Unknown

Dataset Information

0

A new haplotype block detection method for dense genome sequencing data based on interval graph modeling of clusters of highly correlated SNPs.


ABSTRACT: Motivation:Linkage disequilibrium (LD) block construction is required for research in population genetics and genetic epidemiology, including specification of sets of single nucleotide polymorphisms (SNPs) for analysis of multi-SNP based association and identification of haplotype blocks in high density sequencing data. Existing methods based on a narrow sense definition do not allow intermediate regions of low LD between strongly associated SNP pairs and tend to split high density SNP data into small blocks having high between-block correlation. Results:We present Big-LD, a block partition method based on interval graph modeling of LD bins which are clusters of strong pairwise LD SNPs, not necessarily physically consecutive. Big-LD uses an agglomerative approach that starts by identifying small communities of SNPs, i.e. the SNPs in each LD bin region, and proceeds by merging these communities. We determine the number of blocks using a method to find maximum-weight independent set. Big-LD produces larger LD blocks compared to existing methods such as MATILDE, Haploview, MIG?++, or S-MIG?++ and the LD blocks better agree with recombination hotspot locations determined by sperm-typing experiments. The observed average runtime of Big-LD for 13?288?240 non-monomorphic SNPs from 1000 Genomes Project autosome data (286 East Asians) is about 5.83?h, which is a significant improvement over the existing methods. Availability and implementation:Source code and documentation are available for download at http://github.com/sunnyeesl/BigLD. Contact:yyoo@snu.ac.kr. Supplementary information:Supplementary data are available at Bioinformatics online.

SUBMITTER: Kim SA 

PROVIDER: S-EPMC5860363 | biostudies-other | 2018 Feb

REPOSITORIES: biostudies-other

altmetric image

Publications

A new haplotype block detection method for dense genome sequencing data based on interval graph modeling of clusters of highly correlated SNPs.

Kim Sun Ah SA   Cho Chang-Sung CS   Kim Suh-Ryung SR   Bull Shelley B SB   Yoo Yun Joo YJ  

Bioinformatics (Oxford, England) 20180201 3


<h4>Motivation</h4>Linkage disequilibrium (LD) block construction is required for research in population genetics and genetic epidemiology, including specification of sets of single nucleotide polymorphisms (SNPs) for analysis of multi-SNP based association and identification of haplotype blocks in high density sequencing data. Existing methods based on a narrow sense definition do not allow intermediate regions of low LD between strongly associated SNP pairs and tend to split high density SNP d  ...[more]

Similar Datasets

| S-EPMC3898000 | biostudies-literature
| S-EPMC7223266 | biostudies-literature
| S-EPMC2650788 | biostudies-literature
| S-EPMC5998902 | biostudies-literature
| S-EPMC3139583 | biostudies-literature
| S-EPMC3266881 | biostudies-literature
| S-EPMC2927760 | biostudies-literature
| S-EPMC5882691 | biostudies-other
2014-03-25 | GSE54882 | GEO
| S-EPMC1523234 | biostudies-literature