Dataset Information

SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS.

ABSTRACT: The capability of correlating specific genotypes with human diseases is a complex issue in spite of all advantages arisen from high-throughput technologies, such as Genome Wide Association Studies (GWAS). New tools for genetic variants interpretation and for Single Nucleotide Polymorphisms (SNPs) prioritization are actually needed. Given a list of the most relevant SNPs statistically associated to a specific pathology as result of a genotype study, a critical issue is the identification of genes that are effectively related to the disease by re-scoring the importance of the identified genetic variations. Vice versa, given a list of genes, it can be of great importance to predict which SNPs can be involved in the onset of a particular disease, in order to focus the research on their effects.We propose a new bioinformatics approach to support biological data mining in the analysis and interpretation of SNPs associated to pathologies. This system can be employed to design custom genotyping chips for disease-oriented studies and to re-score GWAS results. The proposed method relies (1) on the data integration of public resources using a gene-centric database design, (2) on the evaluation of a set of static biomolecular annotations, defined as features, and (3) on the SNP scoring function, which computes SNP scores using parameters and weights set by users. We employed a machine learning classifier to set default feature weights and an ontological annotation layer to enable the enrichment of the input gene set. We implemented our method as a web tool called SNPranker 2.0 (http://www.itb.cnr.it/snpranker), improving our first published release of this system. A user-friendly interface allows the input of a list of genes, SNPs or a biological process, and to customize the features set with relative weights. As result, SNPranker 2.0 returns a list of SNPs, localized within input and ontologically enriched genes, combined with their prioritization scores.Different databases and resources are already available for SNPs annotation, but they do not prioritize or re-score SNPs relying on a-priori biomolecular knowledge. SNPranker 2.0 attempts to fill this gap through a user-friendly integrated web resource. End users, such as researchers in medical genetics and epidemiology, may find in SNPranker 2.0 a new tool for data mining and interpretation able to support SNPs analysis. Possible scenarios are GWAS data re-scoring, SNPs selection for custom genotyping arrays and SNPs/diseases association studies.

SUBMITTER: Merelli I

PROVIDER: S-EPMC3548692 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS.

Merelli Ivan I Calabria Andrea A Cozzi Paolo P Viti Federica F Mosca Ettore E Milanesi Luciano L

BMC bioinformatics 20130114

<h4>Background</h4>The capability of correlating specific genotypes with human diseases is a complex issue in spite of all advantages arisen from high-throughput technologies, such as Genome Wide Association Studies (GWAS). New tools for genetic variants interpretation and for Single Nucleotide Polymorphisms (SNPs) prioritization are actually needed. Given a list of the most relevant SNPs statistically associated to a specific pathology as result of a genotype study, a critical issue is the iden ...[more]

PMID: 23369106

Similar Datasets

Project description:BackgroundIdentification of pathogenic bacteria from clinical specimens and evaluating their antimicrobial resistance (AMR) are laborious tasks that involve in vitro cultivation, isolation, and susceptibility testing. Recently, a number of methods have been developed that use machine learning algorithms applied to the whole-genome sequencing data of isolates to approach this problem. However, making AMR assessments from more easily available metagenomic sequencing data remains a big challenge.ResultsWe present the Metagenomic Sequencing to Antimicrobial Resistance (MGS2AMR) pipeline, which detects antibiotic resistance genes (ARG) and their possible organism of origin within a sequenced metagenomics sample. This in silico method allows for the evaluation of bacterial AMR directly from clinical specimens, such as stool samples. We have developed two new algorithms to optimize and annotate the genomic assembly paths within the raw Graphical Fragment Assembly (GFA): the GFA Linear Optimal Path through seed segments (GLOPS) algorithm and the Adapted Dijkstra Algorithm for GFA (ADAG). These novel algorithms improve the sensitivity of ARG detection and aid in species annotation. Tests based on 1200 microbiome samples show a high ARG recall rate and correct assignment of the ARG origin. The MGS2AMR output can further be used in many downstream applications, such as evaluating AMR to specific antibiotics in samples from emerging intestinal infections. We demonstrate that the MGS2AMR-derived data is as informative for the entailing prediction models as the whole-genome sequencing (WGS) data. The performance of these models is on par with our previously published method (WGS2AMR), which is based on the sequencing data of bacterial isolates.ConclusionsMGS2AMR can provide researchers with valuable insights into the AMR content of microbiome environments and may potentially improve patient care by providing faster quantification of resistance against specific antibiotics, thereby reducing the use of broad-spectrum antibiotics. The presented pipeline also has potential applications in other metagenome analyses focused on the defined sets of genes. Video Abstract.

Dataset Information

SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS.

Publications

SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets