Dataset Information

Bayesian estimation of nonsynonymous/synonymous rate ratios for pairwise sequence comparisons.

ABSTRACT: The nonsynonymous/synonymous rate ratio (? = d(N)/d(S)) is an important measure of the mode and strength of natural selection acting on nonsynonymous mutations in protein-coding genes. The simplest such analysis is the estimation of the d(N)/d(S) ratio using two sequences. Both heuristic counting methods and the maximum-likelihood (ML) method based on a codon substitution model are widely used for such analysis. However, these methods do not have nice statistical properties, as the estimates can be zero or infinity in some data sets, so that their means and variances are infinite. In large genome-scale comparisons, such extreme estimates (either 0 or ?) of ? and sequence distance (t) are common. Here, we implement a Bayesian method to estimate ? and t in pairwise sequence comparisons. Using a combination of computer simulation and real data analysis, we show that the Bayesian estimates have better statistical properties than the ML estimates, because the prior on ? and t shrinks the posterior of those parameters away from extreme values. We also calculate the posterior probability for ? > 1 as a Bayesian alternative to the likelihood ratio test. The new method is computationally efficient and may be useful for genome-scale comparisons of protein-coding gene sequences.

SUBMITTER: Angelis K

PROVIDER: S-EPMC4069626 | biostudies-literature | 2014 Jul

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Bayesian estimation of nonsynonymous/synonymous rate ratios for pairwise sequence comparisons.

Angelis Konstantinos K Dos Reis Mario M Yang Ziheng Z

Molecular biology and evolution 20140418 7

The nonsynonymous/synonymous rate ratio (ω = d(N)/d(S)) is an important measure of the mode and strength of natural selection acting on nonsynonymous mutations in protein-coding genes. The simplest such analysis is the estimation of the d(N)/d(S) ratio using two sequences. Both heuristic counting methods and the maximum-likelihood (ML) method based on a codon substitution model are widely used for such analysis. However, these methods do not have nice statistical properties, as the estimates can ...[more]

PMID: 24748652

Similar Datasets

Project description:HIV/AIDS is an ongoing global pandemic, with an estimated 39 million infected worldwide. Early detection is anticipated to help improve outcomes and prevent further infections. Point-of-care diagnostics make HIV/AIDS diagnoses available both earlier and to a broader population. Wide-spread and automated HIV risk estimation can offer objective guidance. This supports providers in making an informed decision when considering patients with high HIV risk for HIV testing or pre-exposure prophylaxis (PrEP). We propose a novel machine learning method that allows providers to use the data from a patient's previous stays at the clinic to estimate their HIV risk. All features available in the clinical data are considered, making the set of features objective and independent of expert opinions. The proposed method builds on association rules that are derived from the data. The incidence rate ratio (IRR) is determined for each rule. Given a new patient, the mean IRR of all applicable rules is used to estimate their HIV risk. The method was tested and validated on the publicly available clinical database MIMIC-IV, which consists of around 525,000 hospital stays that included a stay at the intensive care unit or emergency department. We evaluated the method using the area under the receiver operating characteristic curve (AUC). The best performance with an AUC of 0.88 was achieved with a model consisting of 53 rules. A threshold value of 0.66 leads to a sensitivity of 98% and a specificity of 53%. The rules were grouped into drug abuse, psychological illnesses (e.g., PTSD), previously known associations (e.g., pulmonary diseases), and new associations (e.g., certain diagnostic procedures). In conclusion, we propose a novel HIV risk estimation method that builds on existing clinical data. It incorporates a wide range of features, leading to a model that is independent of expert opinions. It supports providers in making informed decisions in the point-of-care diagnostics process by estimating a patient's HIV risk.

Project description:Neurons use sequences of action potentials (spikes) to convey information across neuronal networks. In neurophysiology experiments, information about external stimuli or behavioral tasks has been frequently characterized in term of neuronal firing rate. The firing rate is conventionally estimated by averaging spiking responses across multiple similar experiments (or trials). However, there exist a number of applications in neuroscience research that require firing rate to be estimated on a single trial basis. Estimating firing rate from a single trial is a challenging problem and current state-of-the-art methods do not perform well. To address this issue, we develop a new method for estimating firing rate based on a kernel smoothing technique that considers the bandwidth as a random variable with prior distribution that is adaptively updated under an empirical Bayesian framework. By carefully selecting the prior distribution together with Gaussian kernel function, an analytical expression can be achieved for the kernel bandwidth. We refer to the proposed method as Bayesian Adaptive Kernel Smoother (BAKS). We evaluate the performance of BAKS using synthetic spike train data generated by biologically plausible models: inhomogeneous Gamma (IG) and inhomogeneous inverse Gaussian (IIG). We also apply BAKS to real spike train data from non-human primate (NHP) motor and visual cortex. We benchmark the proposed method against established and previously reported methods. These include: optimized kernel smoother (OKS), variable kernel smoother (VKS), local polynomial fit (Locfit), and Bayesian adaptive regression splines (BARS). Results using both synthetic and real data demonstrate that the proposed method achieves better performance compared to competing methods. This suggests that the proposed method could be useful for understanding the encoding mechanism of neurons in cognitive-related tasks. The proposed method could also potentially improve the performance of brain-machine interface (BMI) decoder that relies on estimated firing rate as the input.

Dataset Information

Bayesian estimation of nonsynonymous/synonymous rate ratios for pairwise sequence comparisons.

Publications

Bayesian estimation of nonsynonymous/synonymous rate ratios for pairwise sequence comparisons.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets