Dataset Information

Predicting deleterious nsSNPs: an analysis of sequence and structural attributes.

ABSTRACT: BACKGROUND: There has been an explosion in the number of single nucleotide polymorphisms (SNPs) within public databases. In this study we focused on non-synonymous protein coding single nucleotide polymorphisms (nsSNPs), some associated with disease and others which are thought to be neutral. We describe the distribution of both types of nsSNPs using structural and sequence based features and assess the relative value of these attributes as predictors of function using machine learning methods. We also address the common problem of balance within machine learning methods and show the effect of imbalance on nsSNP function prediction. We show that nsSNP function prediction can be significantly improved by 100% undersampling of the majority class. The learnt rules were then applied to make predictions of function on all nsSNPs within Ensembl. RESULTS: The measure of prediction success is greatly affected by the level of imbalance in the training dataset. We found the balanced dataset that included all attributes produced the best prediction. The performance as measured by the Matthews correlation coefficient (MCC) varied between 0.49 and 0.25 depending on the imbalance. As previously observed, the degree of sequence conservation at the nsSNP position is the single most useful attribute. In addition to conservation, structural predictions made using a balanced dataset can be of value. CONCLUSION: The predictions for all nsSNPs within Ensembl, based on a balanced dataset using all attributes, are available as a DAS annotation. Instructions for adding the track to Ensembl are at http://www.brightstudy.ac.uk/das_help.html.

SUBMITTER: Dobson RJ

PROVIDER: S-EPMC1489951 | biostudies-literature | 2006

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Predicting deleterious nsSNPs: an analysis of sequence and structural attributes.

Dobson Richard J RJ Munroe Patricia B PB Caulfield Mark J MJ Saqi Mansoor As MA

BMC bioinformatics 20060421

<h4>Background</h4>There has been an explosion in the number of single nucleotide polymorphisms (SNPs) within public databases. In this study we focused on non-synonymous protein coding single nucleotide polymorphisms (nsSNPs), some associated with disease and others which are thought to be neutral. We describe the distribution of both types of nsSNPs using structural and sequence based features and assess the relative value of these attributes as predictors of function using machine learning me ...[more]

PMID: 16630345

Dataset Information

Predicting deleterious nsSNPs: an analysis of sequence and structural attributes.

Publications

Predicting deleterious nsSNPs: an analysis of sequence and structural attributes.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Accurate Sequence-Based Prediction of Deleterious nsSNPs with Multiple Sequence Profiles and Putative Binding Residues.
| S-EPMC8469993 | biostudies-literature

Structure analysis of deleterious nsSNPs in human PALB2 protein for functional inference.
| S-EPMC8131579 | biostudies-literature

Structural genomics approach to investigate deleterious impact of nsSNPs in conserved telomere maintenance component 1.
| S-EPMC8119478 | biostudies-literature

Exploration of structural stability in deleterious nsSNPs of the XPA gene: A molecular dynamics approach.
| S-EPMC3243084 | biostudies-literature

Predicting the most deleterious missense nsSNPs of the protein isoforms of the human HLA-G gene and in silico evaluation of their structural and functional consequences.
| S-EPMC7457528 | biostudies-literature

In-Silico Computing of the Most Deleterious nsSNPs in HBA1 Gene.
| S-EPMC4733110 | biostudies-literature

Prediction and Structural Comparison of Deleterious Coding Nonsynonymous Single Nucleotide Polymorphisms (nsSNPs) in Human LEP Gene Associated with Obesity.
| S-EPMC6913293 | biostudies-literature

An ANN model for the identification of deleterious nsSNPs in tumor suppressor genes.
| S-EPMC3064852 | biostudies-other

A novel computational and structural analysis of nsSNPs in CFTR gene.
| S-EPMC2518663 | biostudies-literature

A Simulation Analysis and Screening of Deleterious Nonsynonymous Single Nucleotide Polymorphisms (nsSNPs) in Sheep <i>LEP</i> Gene.
| S-EPMC9377880 | biostudies-literature