Unknown

Dataset Information

0

RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID.


ABSTRACT: Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (?) and the genome-wide probability of zero IBD sharing (?0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ? and ?0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.

SUBMITTER: Naseri A 

PROVIDER: S-EPMC7853505 | biostudies-literature | 2021 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

RAFFI: Accurate and fast familial relationship inference in large scale biobank studies using RaPID.

Naseri Ardalan A   Shi Junjie J   Lin Xihong X   Zhang Shaojie S   Zhi Degui D  

PLoS genetics 20210121 1


Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, R  ...[more]

Similar Datasets

| S-EPMC6659282 | biostudies-literature
| S-EPMC3141274 | biostudies-literature
| S-EPMC6659841 | biostudies-literature
| S-EPMC6298048 | biostudies-literature
| S-EPMC7754763 | biostudies-literature
| S-EPMC10164567 | biostudies-literature
| S-EPMC4576124 | biostudies-literature
| S-EPMC2760878 | biostudies-other
| S-EPMC3129529 | biostudies-literature
| S-EPMC10680832 | biostudies-literature