Dataset Information

BioBin: a bioinformatics tool for automating the binning of rare variants using publicly available biological knowledge.

ABSTRACT: With the recent decreasing cost of genome sequence data, there has been increasing interest in rare variants and methods to detect their association to disease. We developed BioBin, a flexible collapsing method inspired by biological knowledge that can be used to automate the binning of low frequency variants for association testing. We also built the Library of Knowledge Integration (LOKI), a repository of data assembled from public databases, which contains resources such as: dbSNP and gene Entrez database information from the National Center for Biotechnology (NCBI), pathway information from Gene Ontology (GO), Protein families database (Pfam), Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, NetPath - signal transduction pathways, Open Regulatory Annotation Database (ORegAnno), Biological General Repository for Interaction Datasets (BioGrid), Pharmacogenomics Knowledge Base (PharmGKB), Molecular INTeraction database (MINT), and evolutionary conserved regions (ECRs) from UCSC Genome Browser. The novelty of BioBin is access to comprehensive knowledge-guided multi-level binning. For example, bin boundaries can be formed using genomic locations from: functional regions, evolutionary conserved regions, genes, and/or pathways.We tested BioBin using simulated data and 1000 Genomes Project low coverage data to test our method with simulated causative variants and a pairwise comparison of rare variant (MAF < 0.03) burden differences between Yoruba individuals (YRI) and individuals of European descent (CEU). Lastly, we analyzed the NHLBI GO Exome Sequencing Project Kabuki dataset, a congenital disorder affecting multiple organs and often intellectual disability, contrasted with Complete Genomics data as controls.The results from our simulation studies indicate type I error rate is controlled, however, power falls quickly for small sample sizes using variants with modest effect sizes. Using BioBin, we were able to find simulated variants in genes with less than 20 loci, but found the sensitivity to be much less in large bins. We also highlighted the scale of population stratification between two 1000 Genomes Project data, CEU and YRI populations. Lastly, we were able to apply BioBin to natural biological data from dbGaP and identify an interesting candidate gene for further study.We have established that BioBin will be a very practical and flexible tool to analyze sequence data and potentially uncover novel associations between low frequency variants and complex disease.

SUBMITTER: Moore CB

PROVIDER: S-EPMC3654874 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

BioBin: a bioinformatics tool for automating the binning of rare variants using publicly available biological knowledge.

Moore Carrie B CB Wallace John R JR Frase Alex T AT Pendergrass Sarah A SA Ritchie Marylyn D MD

BMC medical genomics 20130507

<h4>Background</h4>With the recent decreasing cost of genome sequence data, there has been increasing interest in rare variants and methods to detect their association to disease. We developed BioBin, a flexible collapsing method inspired by biological knowledge that can be used to automate the binning of low frequency variants for association testing. We also built the Library of Knowledge Integration (LOKI), a repository of data assembled from public databases, which contains resources such as ...[more]

PMID: 23819467

Similar Datasets

Project description:BackgroundThe air traffic management (ATM) system has historically coped with a global increase in traffic demand ultimately leading to increased operational complexity. When dealing with the impact of this increasing complexity on system safety it is crucial to automatically analyse the losses of separation (LoSs) using tools able to extract meaningful and actionable information from safety reports. Current research in this field mainly exploits natural language processing (NLP) to categorise the reports,with the limitations that the considered categories need to be manually annotated by experts and that general taxonomies are seldom exploited.MethodsTo address the current gaps,authors propose to perform exploratory data analysis on safety reports combining state-of-the-art techniques like topic modelling and clustering and then to develop an algorithm able to extract the Toolkit for ATM Occurrence Investigation (TOKAI) taxonomy factors from the free-text safety reports based on syntactic analysis. TOKAI is a tool for investigation developed by EUROCONTROL and its taxonomy is intended to become a standard and harmonised approach to future investigations.ResultsLeveraging on the LoS events reported in the public databases of the Comisión de Estudio y Análisis de Notificaciones de Incidentes de Tránsito Aéreo and the United Kingdom Airprox Board,authors show how their proposal is able to automatically extract meaningful and actionable information from safety reports,other than to classify their content according to the TOKAI taxonomy. The quality of the approach is also indirectly validated by checking the connection between the identified factors and the main contributor of the incidents.ConclusionsAuthors' results are a promising first step toward the full automation of a general analysis of LoS reports supported by results on real-world data coming from two different sources. In the future,authors' proposal could be extended to other taxonomies or tailored to identify factors to be included in the safety taxonomies.

Project description:BackgroundRapid advancement of next generation sequencing technologies such as whole genome sequencing (WGS) has facilitated the search for genetic factors that influence disease risk in the field of human genetics. To identify rare variants associated with human diseases or traits, an efficient genome-wide binning approach is needed. In this study we developed a novel biological knowledge-based binning approach for rare-variant association analysis and then applied the approach to structural neuroimaging endophenotypes related to late-onset Alzheimer's disease (LOAD).MethodsFor rare-variant analysis, we used the knowledge-driven binning approach implemented in Bin-KAT, an automated tool, that provides 1) binning/collapsing methods for multi-level variant aggregation with a flexible, biologically informed binning strategy and 2) an option of performing unified collapsing and statistical rare variant analyses in one tool. A total of 750 non-Hispanic Caucasian participants from the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort who had both WGS data and magnetic resonance imaging (MRI) scans were used in this study. Mean bilateral cortical thickness of the entorhinal cortex extracted from MRI scans was used as an AD-related neuroimaging endophenotype. SKAT was used for a genome-wide gene- and region-based association analysis of rare variants (MAF (minor allele frequency) < 0.05) and potential confounding factors (age, gender, years of education, intracranial volume (ICV) and MRI field strength) for entorhinal cortex thickness were used as covariates. Significant associations were determined using FDR adjustment for multiple comparisons.ResultsOur knowledge-driven binning approach identified 16 functional exonic rare variants in FANCC significantly associated with entorhinal cortex thickness (FDR-corrected p-value < 0.05). In addition, the approach identified 7 evolutionary conserved regions, which were mapped to FAF1, RFX7, LYPLAL1 and GOLGA3, significantly associated with entorhinal cortex thickness (FDR-corrected p-value < 0.05). In further analysis, the functional exonic rare variants in FANCC were also significantly associated with hippocampal volume and cerebrospinal fluid (CSF) Aβ1-42 (p-value < 0.05).ConclusionsOur novel binning approach identified rare variants in FANCC as well as 7 evolutionary conserved regions significantly associated with a LOAD-related neuroimaging endophenotype. FANCC (fanconi anemia complementation group C) has been shown to modulate TLR and p38 MAPK-dependent expression of IL-1β in macrophages. Our results warrant further investigation in a larger independent cohort and demonstrate that the biological knowledge-driven binning approach is a powerful strategy to identify rare variants associated with AD and other complex disease.

Project description:BACKGROUND:Computational models play an increasingly important role in the assessment and control of public health crises, as demonstrated during the 2009 H1N1 influenza pandemic. Much research has been done in recent years in the development of sophisticated data-driven models for realistic computer-based simulations of infectious disease spreading. However, only a few computational tools are presently available for assessing scenarios, predicting epidemic evolutions, and managing health emergencies that can benefit a broad audience of users including policy makers and health institutions. RESULTS:We present "GLEaMviz", a publicly available software system that simulates the spread of emerging human-to-human infectious diseases across the world. The GLEaMviz tool comprises three components: the client application, the proxy middleware, and the simulation engine. The latter two components constitute the GLEaMviz server. The simulation engine leverages on the Global Epidemic and Mobility (GLEaM) framework, a stochastic computational scheme that integrates worldwide high-resolution demographic and mobility data to simulate disease spread on the global scale. The GLEaMviz design aims at maximizing flexibility in defining the disease compartmental model and configuring the simulation scenario; it allows the user to set a variety of parameters including: compartment-specific features, transition values, and environmental effects. The output is a dynamic map and a corresponding set of charts that quantitatively describe the geo-temporal evolution of the disease. The software is designed as a client-server system. The multi-platform client, which can be installed on the user's local machine, is used to set up simulations that will be executed on the server, thus avoiding specific requirements for large computational capabilities on the user side. CONCLUSIONS:The user-friendly graphical interface of the GLEaMviz tool, along with its high level of detail and the realism of its embedded modeling approach, opens up the platform to simulate realistic epidemic scenarios. These features make the GLEaMviz computational tool a convenient teaching/training tool as well as a first step toward the development of a computational tool aimed at facilitating the use and exploitation of computational models for the policy making and scenario analysis of infectious disease outbreaks.

Dataset Information

BioBin: a bioinformatics tool for automating the binning of rare variants using publicly available biological knowledge.

Publications

BioBin: a bioinformatics tool for automating the binning of rare variants using publicly available biological knowledge.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets