Project description:This study presents a method for genomic prediction that uses individual-level data and summary statistics from multiple populations. Genome-wide markers are nowadays widely used to predict complex traits, and genomic prediction using multi-population data are an appealing approach to achieve higher prediction accuracies. However, sharing of individual-level data across populations is not always possible. We present a method that enables integration of summary statistics from separate analyses with the available individual-level data. The data can either consist of individuals with single or multiple (weighted) phenotype records per individual. We developed a method based on a hypothetical joint analysis model and absorption of population-specific information. We show that population-specific information is fully captured by estimated allele substitution effects and the accuracy of those estimates, i.e., the summary statistics. The method gives identical result as the joint analysis of all individual-level data when complete summary statistics are available. We provide a series of easy-to-use approximations that can be used when complete summary statistics are not available or impractical to share. Simulations show that approximations enable integration of different sources of information across a wide range of settings, yielding accurate predictions. The method can be readily extended to multiple-traits. In summary, the developed method enables integration of genome-wide data in the individual-level or summary statistics from multiple populations to obtain more accurate estimates of allele substitution effects and genomic predictions.
Project description:The accuracy of polygenic risk scores (PRSs) to predict complex diseases increases with the training sample size. PRSs are generally derived based on summary statistics from large meta-analyses of multiple genome-wide association studies (GWASs). However, it is now common for researchers to have access to large individual-level data as well, such as the UK Biobank data. To the best of our knowledge, it has not yet been explored how best to combine both types of data (summary statistics and individual-level data) to optimize polygenic prediction. The most widely used approach to combine data is the meta-analysis of GWAS summary statistics (meta-GWAS), but we show that it does not always provide the most accurate PRS. Through simulations and using 12 real case-control and quantitative traits from both iPSYCH and UK Biobank along with external GWAS summary statistics, we compare meta-GWAS with two alternative data-combining approaches, stacked clumping and thresholding (SCT) and meta-PRS. We find that, when large individual-level data are available, the linear combination of PRSs (meta-PRS) is both a simple alternative to meta-GWAS and often more accurate.
Project description:Most existing tools for constructing genetic prediction models begin with the assumption that all genetic variants contribute equally towards the phenotype. However, this represents a suboptimal model for how heritability is distributed across the genome. Therefore, we develop prediction tools that allow the user to specify the heritability model. We compare individual-level data prediction tools using 14 UK Biobank phenotypes; our new tool LDAK-Bolt-Predict outperforms the existing tools Lasso, BLUP, Bolt-LMM and BayesR for all 14 phenotypes. We compare summary statistic prediction tools using 225 UK Biobank phenotypes; our new tool LDAK-BayesR-SS outperforms the existing tools lassosum, sBLUP, LDpred and SBayesR for 223 of the 225 phenotypes. When we improve the heritability model, the proportion of phenotypic variance explained increases by on average 14%, which is equivalent to increasing the sample size by a quarter.
Project description:UNLABELLED:The estimation of isoform abundances from RNA-Seq data requires a time-intensive step of mapping reads to either an assembled or previously annotated transcriptome, followed by an optimization procedure for deconvolution of multi-mapping reads. These procedures are essential for downstream analysis such as differential expression. In cases where it is desirable to adjust the underlying annotation, for example, on the discovery of novel isoforms or errors in existing annotations, current pipelines must be rerun from scratch. This makes it difficult to update abundance estimates after re-annotation, or to explore the effect of changes in the transcriptome on analyses. We present a novel efficient algorithm for updating abundance estimates from RNA-Seq experiments on re-annotation that does not require re-analysis of the entire dataset. Our approach is based on a fast partitioning algorithm for identifying transcripts whose abundances may depend on the added or deleted isoforms, and on a fast follow-up approach to re-estimating abundances for all transcripts. We demonstrate the effectiveness of our methods by showing how to synchronize RNA-Seq abundance estimates with the daily RefSeq incremental updates. Thus, we provide a practical approach to maintaining relevant databases of RNA-Seq derived abundance estimates even as annotations are being constantly revised. AVAILABILITY AND IMPLEMENTATION:Our methods are implemented in software called ReXpress and are freely available, together with source code, at http://bio.math.berkeley.edu/ReXpress/. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
Project description:Pancreatic ductal adenocarcinoma (PDAC) is categorized as the leading cause of cancer mortality worldwide. However, its predictive markers for long-term survival are not well known. It is interesting to delineate individual-specific perturbed genes when comparing long-term (LT) and short-term (ST) PDAC survivors and integrate individual- and group-based transcriptome profiling. Using a discovery cohort of 19 PDAC patients from CHU-Liège (Belgium), we first performed differential gene expression analysis comparing LT to ST survivor. Second, we adopted systems biology approaches to obtain clinically relevant gene modules. Third, we created individual-specific perturbation profiles. Furthermore, we used Degree-Aware disease gene prioritizing (DADA) method to develop PDAC disease modules; Network-based Integration of Multi-omics Data (NetICS) to integrate group-based and individual-specific perturbed genes in relation to PDAC LT survival. We identified 173 differentially expressed genes (DEGs) in ST and LT survivors and five modules (including 38 DEGs) showing associations to clinical traits. Validation of DEGs in the molecular lab suggested a role of REG4 and TSPAN8 in PDAC survival. Via NetICS and DADA, we identified various known oncogenes such as CUL1 and TGFB1. Our proposed analytic workflow shows the advantages of combining clinical and omics data as well as individual- and group-level transcriptome profiling.
Project description:Cardiovascular disease (CVD) is considered a primary driver of global mortality and is estimated to be responsible for approximately 17.9 million deaths annually. Consequently, a substantial body of research related to CVD has developed, with an emphasis on identifying strategies for the prevention and effective treatment of CVD. In this review, we critically examine the existing CVD literature, and specifically highlight the contribution of Mendelian randomization analyses in CVD research. Throughout this review, we assess the extent to which research findings agree across a range of studies of differing design within a triangulation framework. If differing study designs are subject to non-overlapping sources of bias, consistent findings limit the extent to which results are merely an artefact of study design. Consequently, broad agreement across differing studies can be viewed as providing more robust causal evidence in contrast to limiting the scope of the review to a single specific study design. Utilising the triangulation approach, we highlight emerging patterns in research findings, and explore the potential of identified risk factors as targets for precision medicine and novel interventions.
Project description:Diagnostic coronary angiography in asymptomatic patients may lead to inappropriate percutaneous coronary intervention (PCI) due to a diagnostic-therapeutic cascade. Understanding the association between patient selection for coronary angiography and PCI appropriateness may inform strategies to minimize inappropriate procedures.To determine if hospitals that frequently perform coronary angiography in asymptomatic patients, a clinical scenario in which the benefit of angiography is less clear, are more likely to perform inappropriate PCI.Multicenter observational study of 544 hospitals participating in the CathPCI Registry between July 1, 2009, and September 30, 2013.Hospital proportion of asymptomatic patients at diagnostic coronary angiography and hospital rate of inappropriate PCI as defined by 2012 appropriate use criteria for coronary revascularization.Of 1?225?562 patients who underwent elective coronary angiography, 308?083 (25.1%) were asymptomatic. The hospital proportion of angiography among asymptomatic patients ranged from 1.0% to 73.6% (median, 24.7%; interquartile range, 15.9%-35.9%). By hospital quartile of asymptomatic patients at angiography, hospitals with higher rates of asymptomatic patients at angiography had higher median rates of inappropriate PCI (14.8% vs 20.2% vs 24.0 vs 29.4% from lowest to highest quartile, P?<?.001 for trend). This outcome was attributable to more frequent use of inappropriate PCI in asymptomatic patients at hospitals with higher rates of angiography in asymptomatic patients (5.4% vs 9.9% vs 14.7% vs 21.6% from lowest to highest quartile, P?<?.001 for trend). Hospitals with higher rates of asymptomatic patients at angiography also had lower rates of appropriate PCI (38.7% vs 33.0% vs 32.3% vs 32.9% from lowest to highest quartile, P?<?.001 for trend).In a national sample of hospitals, performance of coronary angiography in asymptomatic patients was associated with higher rates of inappropriate PCI and lower rates of appropriate PCI. Improving preprocedural risk stratification and thresholds for coronary angiography may be one strategy to improve the appropriateness of PCI.
Project description:BACKGROUND: Individual patient data meta-analyses (IPDMAs) prevail as the gold standard in clinical evaluations. We investigated the distribution and epidemiological characteristics of published IPDMA articles. METHODOLOGY/PRINCIPAL FINDINGS: IPDMA articles were identified through comprehensive literature searches from PubMed, Embase, and Cochrane library. Two investigators independently conducted article identification, data classification and extraction. Data related to the article characteristics were collected and analyzed descriptively. A total of 829 IPDMA articles indexed until 9 August 2012 were identified. An average of 3.7 IPDMA articles was published per year. Malignant neoplasms (267 [32.2%]) and circulatory diseases (179 [21.6%]) were the most frequently occurring topics. On average, each IPDMA article included a median of 8 studies (Interquartile range, IQR 5 to 15) involving 2,563 patients (IQR 927 to 8,349). Among 829 IPDMA articles, 229 (27.6%) did not perform a systematic search to identify related studies. In total, 207 (25.0%) sought and included individual patient data (IPD) from the "grey literature". Only 496 (59.8%) successfully obtained IPD from all identified studies. CONCLUSIONS/SIGNIFICANCE: The number of IPDMA articles exhibited an increasing trend over the past few years and mainly focused on cancer and circulatory diseases. Our data indicated that literature searches, including grey literature and data availability were inconsistent among different IPDMA articles. Possible biases may arise. Thus, decision makers should not uncritically accept all IPDMAs.
Project description:ObjectiveAn approach for assessing the urinary microbiome is 16S rRNA gene sequencing, where analysis methods are rapidly evolving. This re-analysis of an existing dataset aimed to determine whether updated bioinformatic and statistical techniques affect clinical inferences.MethodsA prior study compared the urinary microbiome in 123 women with mixed urinary incontinence (MUI) and 84 controls. We obtained unprocessed sequencing data from multiple variable regions, processed operational taxonomic unit (OTU) tables from the original analysis, and de-identified clinical data. We re-processed sequencing data with DADA2 to generate amplicon sequence variant (ASV) tables. Taxa from ASV tables were compared to the original OTU tables; taxa from different variable regions after updated processing were also compared. Bayesian graphical compositional regression (BGCR) was used to test for associations between microbial compositions and clinical phenotypes (e.g., MUI versus control) while adjusting for clinical covariates. Several techniques were used to cluster samples into microbial communities. Multivariable regression was used to test for associations between microbial communities and MUI, again while adjusting for potentially confounding variables.ResultsOf taxa identified through updated bioinformatic processing, only 40% were identified originally, though taxa identified through both methods represented >99% of the sequencing data in terms of relative abundance. Different 16S rRNA gene regions resulted in different recovered taxa. With BGCR analysis, there was a low (33.7%) probability of an association between overall microbial compositions and clinical phenotype. However, when microbial data are clustered into bacterial communities, we confirmed that bacterial communities are associated with MUI. Contrary to the originally published analysis, we did not identify different associations by age group, which may be due to the incorporation of different covariates in statistical models.ConclusionsUpdated bioinformatic processing techniques recover different taxa compared to earlier techniques, though most of these differences exist in low abundance taxa that occupy a small proportion of the overall microbiome. While overall microbial compositions are not associated with MUI, we confirmed associations between certain communities of bacteria and MUI. Incorporation of several covariates that are associated with the urinary microbiome improved inferences when assessing for associations between bacterial communities and MUI in multivariable models.