Dataset Information

Relationship estimation from whole-genome sequence data.

ABSTRACT: The determination of the relationship between a pair of individuals is a fundamental application of genetics. Previously, we and others have demonstrated that identity-by-descent (IBD) information generated from high-density single-nucleotide polymorphism (SNP) data can greatly improve the power and accuracy of genetic relationship detection. Whole-genome sequencing (WGS) marks the final step in increasing genetic marker density by assaying all single-nucleotide variants (SNVs), and thus has the potential to further improve relationship detection by enabling more accurate detection of IBD segments and more precise resolution of IBD segment boundaries. However, WGS introduces new complexities that must be addressed in order to achieve these improvements in relationship detection. To evaluate these complexities, we estimated genetic relationships from WGS data for 1490 known pairwise relationships among 258 individuals in 30 families along with 46 population samples as controls. We identified several genomic regions with excess pairwise IBD in both the pedigree and control datasets using three established IBD methods: GERMLINE, fastIBD, and ISCA. These spurious IBD segments produced a 10-fold increase in the rate of detected false-positive relationships among controls compared to high-density microarray datasets. To address this issue, we developed a new method to identify and mask genomic regions with excess IBD. This method, implemented in ERSA 2.0, fully resolved the inflated cryptic relationship detection rates while improving relationship estimation accuracy. ERSA 2.0 detected all 1(st) through 6(th) degree relationships, and 55% of 9(th) through 11(th) degree relationships in the 30 families. We estimate that WGS data provides a 5% to 15% increase in relationship detection power relative to high-density microarray data for distant relationships. Our results identify regions of the genome that are highly problematic for IBD mapping and introduce new software to accurately detect 1(st) through 9(th) degree relationships from whole-genome sequence data.

SUBMITTER: Li H

PROVIDER: S-EPMC3907355 | biostudies-literature | 2014 Jan

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Relationship estimation from whole-genome sequence data.

Li Hong H Glusman Gustavo G Hu Hao H Shankaracharya Caballero Juan J Hubley Robert R Witherspoon David D Guthery Stephen L SL Mauldin Denise E DE Jorde Lynn B LB Hood Leroy L Roach Jared C JC Huff Chad D CD

PLoS genetics 20140130 1

The determination of the relationship between a pair of individuals is a fundamental application of genetics. Previously, we and others have demonstrated that identity-by-descent (IBD) information generated from high-density single-nucleotide polymorphism (SNP) data can greatly improve the power and accuracy of genetic relationship detection. Whole-genome sequencing (WGS) marks the final step in increasing genetic marker density by assaying all single-nucleotide variants (SNVs), and thus has the ...[more]

PMID: 24497848

Dataset Information

Relationship estimation from whole-genome sequence data.

Publications

Relationship estimation from whole-genome sequence data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Estimating telomere length from whole genome sequence data.
| S-EPMC4027178 | biostudies-literature

Genotype phasing in pedigrees using whole-genome sequence data.
| S-EPMC7253450 | biostudies-literature

A genome-wide scan statistic framework for whole-genome sequence data analysis.
| S-EPMC6616627 | biostudies-literature

Identifying mixed Mycobacterium tuberculosis infections from whole genome sequence data.
| S-EPMC6092779 | biostudies-literature

qmotif: determination of telomere content from whole-genome sequence data.
| S-EPMC9710677 | biostudies-literature

TIGER: inferring DNA replication timing from whole-genome sequence data.
| S-EPMC8913259 | biostudies-literature

Use of whole genome sequence data to infer baculovirus phylogeny.
| S-EPMC115056 | biostudies-literature

Prokaryotic phylogenies inferred from whole-genome sequence and annotation data.
| S-EPMC3773407 | biostudies-literature

PathogenFinder--distinguishing friend from foe using bacterial whole genome sequence data.
| S-EPMC3810466 | biostudies-literature

Comparison of structural variant callers for massive whole-genome sequence data.
| S-EPMC10976732 | biostudies-literature