Unknown

Dataset Information

0

Marked variation in predicted and observed variability of tandem repeat loci across the human genome.


ABSTRACT:

Background

Tandem repeat (TR) variants in the human genome play key roles in a number of diseases. However, current models predicting variability are based on limited training sets. We conducted a systematic analysis of TRs of unit lengths 2-12 nucleotides in Whole Genome Shotgun (WGS) sequences to define the extent of variation of 209,214 unique repeat loci throughout the genome.

Results

We applied a multivariate statistical model to predict TR variability. Predicted heterozygosity correlated with heterozygosity in the CEPH polymorphism database (correlation rho = 0.29, p < 0.0005) better than the correlation between the CEPH and WGS data (rho = 0.17), presumably because the model smoothes noise from small sample sizes. A multivariate logistic model of 8 parameters accounted for 36% of the variation in the WGS data. Validation studies of 70 experimentally investigated TRs revealed high concordance with the model's predictions (p < 0.0001).

Conclusion

Variability among 2-12-mer TRs in the genome can be modeled by a few parameters, which do not markedly differ according to unit length, consistent with a common mechanism for the generation of variability among such TRs. Analysis of the distributions of observed and predicted variants across the genome showed a general concordance, indicating that the repeat variation dataset does not exhibit strong regional ascertainment biases. This revealed a deficit of variant repeats in chromosomes 19 and Y - likely to reflect a reduction in 2-mer repeats in the former and a reduced level of recombination in the latter - and excesses in chromosomes 6, 13, 20 and 21.

SUBMITTER: O'Dushlaine CT 

PROVIDER: S-EPMC2364633 | biostudies-literature | 2008 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Marked variation in predicted and observed variability of tandem repeat loci across the human genome.

O'Dushlaine Colm T CT   Shields Denis C DC  

BMC genomics 20080416


<h4>Background</h4>Tandem repeat (TR) variants in the human genome play key roles in a number of diseases. However, current models predicting variability are based on limited training sets. We conducted a systematic analysis of TRs of unit lengths 2-12 nucleotides in Whole Genome Shotgun (WGS) sequences to define the extent of variation of 209,214 unique repeat loci throughout the genome.<h4>Results</h4>We applied a multivariate statistical model to predict TR variability. Predicted heterozygosi  ...[more]

Similar Datasets

| S-EPMC4027155 | biostudies-literature
| S-EPMC3018141 | biostudies-literature
| S-EPMC5698871 | biostudies-other
| S-EPMC7304526 | biostudies-literature
| S-EPMC1273636 | biostudies-literature
| S-EPMC6218588 | biostudies-literature
| S-EPMC5533702 | biostudies-literature
| S-EPMC2639597 | biostudies-literature
| S-EPMC5105644 | biostudies-literature
| S-EPMC8275641 | biostudies-literature