Dataset Information

SweHLA: the high confidence HLA typing bio-resource drawn from 1000 Swedish genomes.

ABSTRACT: There is a need to accurately call human leukocyte antigen (HLA) genes from existing short-read sequencing data, however there is no single solution that matches the gold standard of Sanger sequenced lab typing. Here we aimed to combine results from available software programs, minimizing the biases of applied algorithm and HLA reference. The result is a robust HLA population resource for the published 1000 Swedish genomes, and a framework for future HLA interrogation. HLA 2nd-field alleles were called using four imputation and inference methods for the classical eight genes (class I: HLA-A, HLA-B, HLA-C; class II: HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1, HLA-DRB1). A high confidence population set (SweHLA) was determined using an n-1 concordance rule for class I (four software) and class II (three software) alleles. Results were compared across populations and individual programs benchmarked to SweHLA. Per gene, 875 to 988 of the 1000 samples were genotyped in SweHLA; 920 samples had at least seven loci called. While a small fraction of reference alleles were common to all software (class I?=?1.9% and class II?=?4.1%), this did not affect the overall call rate. Gene-level concordance was high compared to European populations (>0.83%), with COX and PGF the dominant SweHLA haplotypes. We noted that 15/18 discordant alleles (delta allele frequency >2) were previously reported as disease-associated. These differences could in part explain across-study genetic replication failures, reinforcing the need to use multiple software solutions. SweHLA demonstrates a way to use existing NGS data to generate a population resource agnostic to individual HLA software biases.

SUBMITTER: Nordin J

PROVIDER: S-EPMC7170882 | biostudies-literature | 2020 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

SweHLA: the high confidence HLA typing bio-resource drawn from 1000 Swedish genomes.

Nordin Jessika J Ameur Adam A Lindblad-Toh Kerstin K Gyllensten Ulf U Meadows Jennifer R S JRS

European journal of human genetics : EJHG 20191216 5

There is a need to accurately call human leukocyte antigen (HLA) genes from existing short-read sequencing data, however there is no single solution that matches the gold standard of Sanger sequenced lab typing. Here we aimed to combine results from available software programs, minimizing the biases of applied algorithm and HLA reference. The result is a robust HLA population resource for the published 1000 Swedish genomes, and a framework for future HLA interrogation. HLA 2nd-field alleles were ...[more]

PMID: 31844174

Dataset Information

SweHLA: the high confidence HLA typing bio-resource drawn from 1000 Swedish genomes.

Publications

SweHLA: the high confidence HLA typing bio-resource drawn from 1000 Swedish genomes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

HLA typing from 1000 genomes whole genome and whole exome illumina data.
| S-EPMC3819389 | biostudies-literature

HLA diversity in the 1000 genomes dataset.
| S-EPMC4079705 | biostudies-literature

IndiGenomes: a comprehensive resource of genetic variants from over 1000 Indian genomes.
| S-EPMC7778947 | biostudies-literature

arcasHLA: high-resolution HLA typing from RNAseq.
| S-EPMC6956775 | biostudies-literature

Ultraspecific probes for high throughput HLA typing.
| S-EPMC2661095 | biostudies-literature

1000 Genomes Phase 3 aCGH samples
2015-07-01 | GSE70188 | GEO

High level of inbreeding in final phase of 1000 Genomes Project.
| S-EPMC4667178 | biostudies-literature

HLA imputation in an admixed population: An assessment of the 1000 Genomes data as a training set.
| S-EPMC5609807 | biostudies-literature

Transposable element insertions in 1000 Swedish individuals.
| S-EPMC10381067 | biostudies-literature

1000 Genomes Phase 3 aCGH samples
2015-07-01 | E-GEOD-70188 | biostudies-arrayexpress