Unknown

Dataset Information

0

Identification of genetic outliers due to sub-structure and cryptic relationships.


ABSTRACT:

Motivation

In order to minimize the effects of genetic confounding on the analysis of high-throughput genetic association studies, e.g. (whole-genome) sequencing (WGS) studies, genome-wide association studies (GWAS), etc., we propose a general framework to assess and to test formally for genetic heterogeneity among study subjects. As the approach fully utilizes the recent ancestor information captured by rare variants, it is especially powerful in WGS studies. Even for relatively moderate sample sizes, the proposed testing framework is able to identify study subjects that are genetically too similar, e.g. cryptic relationships, or that are genetically too different, e.g. population substructure. The approach is computationally fast, enabling the application to whole-genome sequencing data, and straightforward to implement.

Results

Simulation studies illustrate the overall performance of our approach. In an application to the 1000 Genomes Project, we outline an analysis/cleaning pipeline that utilizes our approach to formally assess whether study subjects are related and whether population substructure is present. In the analysis of the 1000 Genomes Project data, our approach revealed subjects that are most likely related, but had previously passed standard qc-filters.

Availability and implementation

An implementation of our method, Similarity Test for Estimating Genetic Outliers (STEGO), is available in the R package stego from Github at https://github.com/dschlauch/stego .

Contact

dschlauch@fas.harvard.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Schlauch D 

PROVIDER: S-EPMC5870703 | biostudies-literature | 2017 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Identification of genetic outliers due to sub-structure and cryptic relationships.

Schlauch Daniel D   Fier Heide H   Lange Christoph C  

Bioinformatics (Oxford, England) 20170701 13


<h4>Motivation</h4>In order to minimize the effects of genetic confounding on the analysis of high-throughput genetic association studies, e.g. (whole-genome) sequencing (WGS) studies, genome-wide association studies (GWAS), etc., we propose a general framework to assess and to test formally for genetic heterogeneity among study subjects. As the approach fully utilizes the recent ancestor information captured by rare variants, it is especially powerful in WGS studies. Even for relatively moderat  ...[more]

Similar Datasets

| S-EPMC3071720 | biostudies-other
| S-EPMC7264336 | biostudies-literature
| S-EPMC1538801 | biostudies-literature
2019-11-12 | PXD010614 | Pride
| S-EPMC9872813 | biostudies-literature
| S-EPMC3378896 | biostudies-other
| S-EPMC5436028 | biostudies-literature
| S-EPMC6749445 | biostudies-literature
| S-EPMC6434890 | biostudies-literature
| S-EPMC1219638 | biostudies-other