Unknown

Dataset Information

0

HLA imputation in an admixed population: An assessment of the 1000 Genomes data as a training set.


ABSTRACT: Methods to impute HLA alleles based on dense single nucleotide polymorphism (SNP) data provide a valuable resource to association studies and evolutionary investigation of the MHC region. The availability of appropriate training sets is critical to the accuracy of HLA imputation, and the inclusion of samples with various ancestries is an important pre-requisite in studies of admixed populations. We assess the accuracy of HLA imputation using 1000 Genomes Project data as a training set, applying it to a highly admixed Brazilian population, the Quilombos from the state of São Paulo. To assess accuracy, we compared imputed and experimentally determined genotypes for 146 samples at 4 HLA classical loci. We found imputation accuracies of 82.9%, 81.8%, 94.8% and 86.6% for HLA-A, -B, -C and -DRB1 respectively (two-field resolution). Accuracies were improved when we included a subset of Quilombo individuals in the training set. We conclude that the 1000 Genomes data is a valuable resource for construction of training sets due to the diversity of ancestries and the potential for a large overlap of SNPs with the target population. We also show that tailoring training sets to features of the target population substantially enhances imputation accuracy.

SUBMITTER: Nunes K 

PROVIDER: S-EPMC5609807 | biostudies-literature | 2016 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

HLA imputation in an admixed population: An assessment of the 1000 Genomes data as a training set.

Nunes Kelly K   Zheng Xiuwen X   Torres Margareth M   Moraes Maria Elisa ME   Piovezan Bruno Z BZ   Pontes Gerlandia N GN   Kimura Lilian L   Carnavalli Juliana E P JEP   Mingroni Netto Regina C RC   Meyer Diogo D  

Human immunology 20151112 3


Methods to impute HLA alleles based on dense single nucleotide polymorphism (SNP) data provide a valuable resource to association studies and evolutionary investigation of the MHC region. The availability of appropriate training sets is critical to the accuracy of HLA imputation, and the inclusion of samples with various ancestries is an important pre-requisite in studies of admixed populations. We assess the accuracy of HLA imputation using 1000 Genomes Project data as a training set, applying  ...[more]

Similar Datasets

| S-EPMC4580532 | biostudies-literature
| S-EPMC3511547 | biostudies-literature
| S-EPMC3819389 | biostudies-literature
| S-EPMC4079705 | biostudies-literature
| S-EPMC3703942 | biostudies-literature
| S-EPMC5090169 | biostudies-literature
| S-EPMC5532257 | biostudies-literature
| S-EPMC11353365 | biostudies-literature
| S-EPMC3572961 | biostudies-literature
| S-EPMC3376268 | biostudies-literature