Systematic assessment of imputation performance using the 1000 Genomes reference panels.
Ontology highlight
ABSTRACT: Genotype imputation has been widely adopted in the postgenome-wide association studies (GWAS) era. Owing to its ability to accurately predict the genotypes of untyped variants, imputation greatly boosts variant density, allowing fine-mapping studies of GWAS loci and large-scale meta-analysis across different genotyping arrays. By leveraging genotype data from 90 whole-genome deeply sequenced individuals as the evaluation benchmark and the 1000 Genomes Project data as reference panels, we systematically examined four important issues related to genotype imputation practice. First, in a study of imputation accuracy, we found that IMPUTE2 and minimac have the best imputation performance among the three popular imputing software evaluated and that using a multi-population reference panel is beneficial. Second, the optimal imputation quality cutoff for removing poorly imputed variants varies according to the software used. Third, the major contributing factors to consistently poor imputation are low variant heterozygosity, high sequence similarity to other genomic regions, high GC content, segmental duplication and being far from genotyping markers. Lastly, in an evaluation of the imputability of all known GWAS regions, we found that GWAS loci associated with hematological measurements and immune system diseases are harder to impute, as compared with other human traits. Recommendations made based on the above findings may provide practical guidance for imputation exercise in future genetic studies.
SUBMITTER: Liu Q
PROVIDER: S-EPMC4580532 | biostudies-literature | 2015 Jul
REPOSITORIES: biostudies-literature
ACCESS DATA