Dataset Information

A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank.

ABSTRACT: With increasing biobanking efforts connecting electronic health records and national registries to germline genetics, the time-to-event data analysis has attracted increasing attention in the genetics studies of human diseases. In time-to-event data analysis, the Cox proportional hazards (PH) regression model is one of the most used approaches. However, existing methods and tools are not scalable when analyzing a large biobank with hundreds of thousands of samples and endpoints, and they are not accurate when testing low-frequency and rare variants. Here, we propose a scalable and accurate method, SPACox (a saddlepoint approximation implementation based on the Cox PH regression model), that is applicable for genome-wide scale time-to-event data analysis. SPACox requires fitting a Cox PH regression model only once across the genome-wide analysis and then uses a saddlepoint approximation (SPA) to calibrate the test statistics. Simulation studies show that SPACox is 76-252 times faster than other existing alternatives, such as gwasurvivr, 185-511 times faster than the standard Wald test, and more than 6,000 times faster than the Firth correction and can control type I error rates at the genome-wide significance level regardless of minor allele frequencies. Through the analysis of UK Biobank inpatient data of 282,871 white British European ancestry samples, we show that SPACox can efficiently analyze large sample sizes and accurately control type I error rates. We identified 611 loci associated with time-to-event phenotypes of 12 common diseases, of which 38 loci would be missed within a logistic regression framework with a binary phenotype defined as event occurrence status during the follow-up period.

SUBMITTER: Bi W

PROVIDER: S-EPMC7413891 | biostudies-literature | 2020 Aug

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank.

Bi Wenjian W Fritsche Lars G LG Mukherjee Bhramar B Kim Sehee S Lee Seunggeun S

American journal of human genetics 20200625 2

With increasing biobanking efforts connecting electronic health records and national registries to germline genetics, the time-to-event data analysis has attracted increasing attention in the genetics studies of human diseases. In time-to-event data analysis, the Cox proportional hazards (PH) regression model is one of the most used approaches. However, existing methods and tools are not scalable when analyzing a large biobank with hundreds of thousands of samples and endpoints, and they are not ...[more]

PMID: 32589924

Dataset Information

A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank.

Publications

A Fast and Accurate Method for Genome-Wide Time-to-Event Data Analysis and Its Application to UK Biobank.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

A Fast and Accurate Method for Genome-wide Scale Phenome-wide G × E Analysis and Its Application to UK Biobank.
| S-EPMC6904814 | biostudies-literature

Fast and accurate long-range phasing in a UK Biobank cohort.
| S-EPMC4925291 | biostudies-literature

A fast and accurate method for detection of IBD shared haplotypes in genome-wide SNP data.
| S-EPMC5437913 | biostudies-literature

Phenome-wide heritability analysis of the UK Biobank.
| S-EPMC5400281 | biostudies-literature

A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank.
| S-EPMC7641476 | biostudies-literature

Genome-wide analyses using UK Biobank data provide insights into the genetic architecture of osteoarthritis.
| S-EPMC5896734 | biostudies-literature

Identification of new therapeutic targets for osteoarthritis through genome-wide analyses of UK Biobank data.
| S-EPMC6400267 | biostudies-literature

Predictive Big Data Analytics using the UK Biobank Data.
| S-EPMC6461626 | biostudies-literature

Application of Correlated Time-to-Event Models to Ecological Momentary Assessment Data.
| S-EPMC5050055 | biostudies-literature

Prediction of a time-to-event trait using genome wide SNP data.
| S-EPMC3651372 | biostudies-literature