Dataset Information

A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

ABSTRACT:

Background

Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance.

Results

We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway.

Conclusions

The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

SUBMITTER: Nishiyama T

PROVIDER: S-EPMC3130692 | biostudies-literature | 2011 May

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

Nishiyama Takeshi T Takahashi Kunihiko K Tango Toshiro T Pinto Dalila D Scherer Stephen W SW Takami Satoshi S Kishino Hirohisa H

BMC bioinformatics 20110526

<h4>Background</h4>Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance.<h4>Results</h4>We propo ...[more]

PMID: 21612662

Similar Datasets

Project description:BackgroundA common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics.ResultsThis research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases.ConclusionsThe GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic.

Project description:BACKGROUND: In healthcare facilities, conventional surveillance techniques using rule-based guidelines may result in under- or over-reporting of methicillin-resistant Staphylococcus aureus (MRSA) outbreaks, as these guidelines are generally unvalidated. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting MRSA clusters, validate clusters using molecular techniques and hospital records, and determine significant differences in the rate of MRSA cases using regression models. METHODS: Patients admitted to a community hospital between August 2006 and February 2011, and identified with MRSA>48 hours following hospital admission, were included in this study. Between March 2010 and February 2011, MRSA specimens were obtained for spa typing. MRSA clusters were investigated using a retrospective temporal scan statistic. Tests were conducted on a monthly scale and significant clusters were compared to MRSA outbreaks identified by hospital personnel. Associations between the rate of MRSA cases and the variables year, month, and season were investigated using a negative binomial regression model. RESULTS: During the study period, 735 MRSA cases were identified and 167 MRSA isolates were spa typed. Nine different spa types were identified with spa type 2/t002 (88.6%) the most prevalent. The temporal scan statistic identified significant MRSA clusters at the hospital (n=2), service (n=16), and ward (n=10) levels (P ? 0.05). Seven clusters were concordant with nine MRSA outbreaks identified by hospital staff. For the remaining clusters, seven events may have been equivalent to true outbreaks and six clusters demonstrated possible transmission events. The regression analysis indicated years 2009-2011, compared to 2006, and months March and April, compared to January, were associated with an increase in the rate of MRSA cases (P ? 0.05). CONCLUSIONS: The application of the temporal scan statistic identified several MRSA clusters that were not detected by hospital personnel. The identification of specific years and months with increased MRSA rates may be attributable to several hospital level factors including the presence of other pathogens. Within hospitals, the incorporation of the temporal scan statistic to standard surveillance techniques is a valuable tool for healthcare workers to evaluate surveillance strategies and aid in the identification of MRSA clusters.

Project description:Immunoglobulin A Nephropathy (IgAN) is a complex multifactorial disease whose genetic bases remain unknown. Distinct linkage and genome-wide association studies in both familial and sporadic IgAN suggest that there is a strong genetic component in IgAN. In this context, an intriguing role could be ascribed to copy number variants (CNVs) that have been recognized as an important source of genetic variation in humans. Here, we performed a whole-genome screening of CNVs in IgAN patients, their healthy relatives and healthy subjects (HS). A total of 217 individuals consisting of 51 IgAN cases and 166 healthy relatives were included in the initial screening. The high-throughput analysis of structural genetic variations, to find concordant aberrations across classes of samples, identified 178 IgAN-specific aberrations, specifically 114 loss and 64 gain. Several CNVs overlapped with regions evidenced by previous genome-wide genetic studies. Moreover, we found that IgAN patients characterized by deteriorated renal function carried low copy numbers of a CNV in chromosome 3 (chr3_loss:52031010-52260722). This CNV contained the TLR9 gene whose expression significantly correlated with the loss aberration in patients with progressive renal damage. Conversely, IgAN patients with normal renal function had no chr3_loss:52031010-52260722 and the TLR9 mRNA was expressed at the same level as in HS, still maintaining a strong correlation with the CNV. In conclusion, here we performed the first genome-wide CNV study in IgAN identifying some structural variants specific to IgAN patients and providing a collection of new candidate genes and loci that could help to dissect the complex genomic setting of the disease. Moreover, we identified a specific CNV, spanning the TLR9 gene, which could explain the disease severity in IgAN patients. To perform a genome-wide CNV study in IgAN identifying some structural variants specific to IgAN patients and providing a collection of new candidate genes and loci that could help to dissect the complex genomic setting of the disease.

Dataset Information

A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

Background

Results

Conclusions

Publications

A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets