Unknown

Dataset Information

0

Second-generation PLINK: rising to the challenge of larger and richer datasets.


ABSTRACT: BACKGROUND:PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for faster and scalable implementations of key functions, such as logistic regression, linkage disequilibrium estimation, and genomic distance evaluation. In addition, GWAS and population-genetic data now frequently contain genotype likelihoods, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. FINDINGS:To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, [Formula: see text]-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. We have also developed an extension to the data format which adds low-overhead support for genotype likelihoods, phase, multiallelic variants, and reference vs. alternate alleles, which is the basis of our planned second release (PLINK 2.0). CONCLUSIONS:The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.

SUBMITTER: Chang CC 

PROVIDER: S-EPMC4342193 | biostudies-literature | 2015

REPOSITORIES: biostudies-literature

altmetric image

Publications

Second-generation PLINK: rising to the challenge of larger and richer datasets.

Chang Christopher C CC   Chow Carson C CC   Tellier Laurent Cam LC   Vattikuti Shashaank S   Purcell Shaun M SM   Lee James J JJ  

GigaScience 20150225


<h4>Background</h4>PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for faster and scalable implementations of key functions, such as logistic regression, linkage disequilibrium estimation, and genomic distance evaluation. In addition, GWAS and population-genetic data now frequently contain genotyp  ...[more]

Similar Datasets

| S-EPMC4237170 | biostudies-literature
| S-EPMC3077236 | biostudies-literature
| S-EPMC5570257 | biostudies-literature
| S-EPMC8034636 | biostudies-literature
| S-EPMC7500662 | biostudies-literature
| S-EPMC6288579 | biostudies-other
| S-EPMC3130546 | biostudies-literature
| S-EPMC1978487 | biostudies-literature
| S-EPMC3188388 | biostudies-literature
| S-EPMC8612378 | biostudies-literature