Unknown

Dataset Information

0

Simpute: an efficient solution for dense genotypic data.


ABSTRACT: Single nucleotide polymorphism (SNP) data derived from array-based technology or massive parallel sequencing are often flawed with missing data. Missing SNPs can bias the results of association analyses. To maximize information usage, imputation is often adopted to compensate for the missing data by filling in the most probable values. To better understand the available tools for this purpose, we compare the imputation performances among BEAGLE, IMPUTE, BIMBAM, SNPMStat, MACH, and PLINK with data generated by randomly masking the genotype data from the International HapMap Phase III project. In addition, we propose a new algorithm called simple imputation (Simpute) that benefits from the high resolution of the SNPs in the array platform. Simpute does not require any reference data. The best feature of Simpute is its computational efficiency with complexity of order (mw + n), where n is the number of missing SNPs, w is the number of the positions of the missing SNPs, and m is the number of people considered. Simpute is suitable for regular screening of the large-scale SNP genotyping particularly when the sample size is large, and efficiency is a major concern in the analysis.

SUBMITTER: Lin YJ 

PROVIDER: S-EPMC3581137 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

altmetric image

Publications

Simpute: an efficient solution for dense genotypic data.

Lin Yen-Jen YJ   Chang Chun-Tien CT   Tang Chuan Yi CY   Hsieh Wen-Ping WP  

BioMed research international 20130203


Single nucleotide polymorphism (SNP) data derived from array-based technology or massive parallel sequencing are often flawed with missing data. Missing SNPs can bias the results of association analyses. To maximize information usage, imputation is often adopted to compensate for the missing data by filling in the most probable values. To better understand the available tools for this purpose, we compare the imputation performances among BEAGLE, IMPUTE, BIMBAM, SNPMStat, MACH, and PLINK with dat  ...[more]

Similar Datasets

| S-EPMC8363843 | biostudies-literature
| S-EPMC5302086 | biostudies-literature
| PRJEB26324 | ENA
| S-EPMC2795971 | biostudies-literature
| S-EPMC3898000 | biostudies-literature
| S-EPMC3266881 | biostudies-literature
| S-EPMC3852919 | biostudies-literature
| S-EPMC8901535 | biostudies-literature
| S-EPMC3827222 | biostudies-literature
| S-EPMC3231954 | biostudies-literature