Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

A two-stage pruning algorithm for likelihood computation for a population tree.

ABSTRACT: We have developed a pruning algorithm for likelihood estimation of a tree of populations. This algorithm enables us to compute the likelihood for large trees. Thus, it gives an efficient way of obtaining the maximum-likelihood estimate (MLE) for a given tree topology. Our method utilizes the differences accumulated by random genetic drift in allele count data from single-nucleotide polymorphisms (SNPs), ignoring the effect of mutation after divergence from the common ancestral population. The computation of the maximum-likelihood tree involves both maximizing likelihood over branch lengths of a given topology and comparing the maximum-likelihood across topologies. Here our focus is the maximization of likelihood over branch lengths of a given topology. The pruning algorithm computes arrays of probabilities at the root of the tree from the data at the tips of the tree; at the root, the arrays determine the likelihood. The arrays consist of probabilities related to the number of coalescences and allele counts for the partially coalesced lineages. Computing these probabilities requires an unusual two-stage algorithm. Our computation is exact and avoids time-consuming Monte Carlo methods. We can also correct for ascertainment bias.

SUBMITTER: RoyChoudhury A

PROVIDER: S-EPMC2567359 | biostudies-other | 2008 Oct

REPOSITORIES: biostudies-other

ACCESS DATA

Json Xml

Similar Datasets

A topology-marginal composite likelihood via a generalized phylogenetic pruning algorithm.

Project description:Bayesian phylogenetics is a computationally challenging inferential problem. Classical methods are based on random-walk Markov chain Monte Carlo (MCMC), where random proposals are made on the tree parameter and the continuous parameters simultaneously. Variational phylogenetics is a promising alternative to MCMC, in which one fits an approximating distribution to the unnormalized phylogenetic posterior. Previous work fit this variational approximation using stochastic gradient descent, which is the canonical way of fitting general variational approximations. However, phylogenetic trees are special structures, giving opportunities for efficient computation. In this paper we describe a new algorithm that directly generalizes the Felsenstein pruning algorithm (a.k.a. sum-product algorithm) to compute a composite-like likelihood by marginalizing out ancestral states and subtrees simultaneously. We show the utility of this algorithm by rapidly making point estimates for branch lengths of a multi-tree phylogenetic model. These estimates accord with a long MCMC run and with estimates obtained using a variational method, but are much faster to obtain. Thus, although generalized pruning does not lead to a variational algorithm as such, we believe that it will form a useful starting point for variational inference.

| S-EPMC10391877 | biostudies-literature

An efficient algorithm for accurate computation of the Dirichlet-multinomial log-likelihood function.

Project description:The Dirichlet-multinomial (DMN) distribution is a fundamental model for multicategory count data with overdispersion. This distribution has many uses in bioinformatics including applications to metagenomics data, transctriptomics and alternative splicing. The DMN distribution reduces to the multinomial distribution when the overdispersion parameter ψ is 0. Unfortunately, numerical computation of the DMN log-likelihood function by conventional methods results in instability in the neighborhood of [Formula: see text]. An alternative formulation circumvents this instability, but it leads to long runtimes that make it impractical for large count data common in bioinformatics. We have developed a new method for computation of the DMN log-likelihood to solve the instability problem without incurring long runtimes. The new approach is composed of a novel formula and an algorithm to extend its applicability. Our numerical experiments show that this new method both improves the accuracy of log-likelihood evaluation and the runtime by several orders of magnitude, especially in high-count data situations that are common in deep sequencing data. Using real metagenomic data, our method achieves manyfold runtime improvement. Our method increases the feasibility of using the DMN distribution to model many high-throughput problems in bioinformatics. We have included in our work an R package giving access to this method and a vingette applying this approach to metagenomic data.

| S-EPMC4081639 | biostudies-literature

BayesAge: A Maximum Likelihood Algorithm To Predict Epigenetic Age

Project description:DNA methylation is a reaction that results in the formation of 5-methylcytosine when a methyl group is added to the cytosine’s C5 position. As organisms age, DNA methylation patterns change in a reproducible fashion. This phenomenon has established DNA methylation as a valuable biomarker in aging studies. Epigenetic clocks based on weighted combinations of methylation sites have been developed to accurately predict the age of an individual from their methylome. However, many epigenetic clocks, particularly those that utilize penalized regression, model the changes in methylation linearly with age. Moreover, these models, which use methylation levels as features, are not robust to missing data and do not account for the count-based nature of bisulfite sequence data. Additionally, the models are generally not interpretable. To overcome these challenges, we present BayesAge, an extension of the previously developed scAge approach that was developed for the analysis of single cell DNA methylation datasets. BayesAge utilizes maximum likelihood estimation (MLE) to infer ages, models count data using binomial distributions, and uses LOWESS smoothing to capture the non-linear dynamics between methylation and age. Our approach is designed for use with bulk bisulfite sequencing datasets. BayesAge outperforms scAge in several respects. Specifically, BayesAge’s age residuals are not age associated, thus providing a less biased representation of epigenetic age variation across populations. Moreover, BayesAge enables the estimation of error bounds on age inference and, when run on downsampled data, its coefficient of determination between predicted and actual ages surpasses both scAge and penalized regression.

2024-03-20 | GSE261769 | GEO

Bayesian computation via empirical likelihood.

Project description:Approximate Bayesian computation has become an essential tool for the analysis of complex stochastic models when the likelihood function is numerically unavailable. However, the well-established statistical method of empirical likelihood provides another route to such settings that bypasses simulations from the model and the choices of the approximate Bayesian computation parameters (summary statistics, distance, tolerance), while being convergent in the number of observations. Furthermore, bypassing model simulations may lead to significant time savings in complex models, for instance those found in population genetics. The Bayesian computation with empirical likelihood algorithm we develop in this paper also provides an evaluation of its own performance through an associated effective sample size. The method is illustrated using several examples, including estimation of standard distributions, time series, and population genetics models.

| S-EPMC3557074 | biostudies-literature

IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.

Project description:Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3-97.1%. IQ-TREE is freely available at http://www.cibiv.at/software/iqtree.

| S-EPMC4271533 | biostudies-literature

OrthoSNAP: A tree splitting and pruning algorithm for retrieving single-copy orthologs from gene family trees.

Project description:Molecular evolution studies, such as phylogenomic studies and genome-wide surveys of selection, often rely on gene families of single-copy orthologs (SC-OGs). Large gene families with multiple homologs in 1 or more species-a phenomenon observed among several important families of genes such as transporters and transcription factors-are often ignored because identifying and retrieving SC-OGs nested within them is challenging. To address this issue and increase the number of markers used in molecular evolution studies, we developed OrthoSNAP, a software that uses a phylogenetic framework to simultaneously split gene families into SC-OGs and prune species-specific inparalogs. We term SC-OGs identified by OrthoSNAP as SNAP-OGs because they are identified using a splitting and pruning procedure analogous to snapping branches on a tree. From 415,129 orthologous groups of genes inferred across 7 eukaryotic phylogenomic datasets, we identified 9,821 SC-OGs; using OrthoSNAP on the remaining 405,308 orthologous groups of genes, we identified an additional 10,704 SNAP-OGs. Comparison of SNAP-OGs and SC-OGs revealed that their phylogenetic information content was similar, even in complex datasets that contain a whole-genome duplication, complex patterns of duplication and loss, transcriptome data where each gene typically has multiple transcripts, and contentious branches in the tree of life. OrthoSNAP is useful for increasing the number of markers used in molecular evolution data matrices, a critical step for robustly inferring and exploring the tree of life.

| S-EPMC9595520 | biostudies-literature

Algorithm for the Pruning of Synthesis Graphs.

Project description:Synthesis route planning is in the core of chemical intelligence that will power the autonomous chemistry platforms. In this task, we rely on algorithms to generate possible synthesis routes with the help of retro- and forward-synthetic approaches. Generated synthesis routes can be merged into a synthesis graph which represents theoretical pathways to the target molecule. However, it is often required to modify a synthesis graph due to typical constraints. These constraints might include "undesirable substances", e.g., an intermediate that the chemist does not favor or substances that might be toxic. Consequently, we need to prune the synthesis graph by the elimination of such undesirable substances. Synthesis graphs can be represented as directed (not necessarily acyclic) bipartite graphs, and the pruning of such graphs in the light of a set of undesirable substances has been an open question. In this study, we present the Synthesis Graph Pruning (SGP) algorithm that addresses this question. The input to the SGP algorithm is a synthesis graph and a set of undesirable substances. Furthermore, information for substances is provided as metadata regarding their availability from the inventory. The SGP algorithm operates with a simple local rule set, in order to determine which nodes and edges need to be eliminated from the synthesis graph. In this study, we present the SGP algorithm in detail and provide several case studies that demonstrate the operation of the SGP algorithm. We believe that the SGP algorithm will be an essential component of computer aided synthesis planning.

| S-EPMC9093600 | biostudies-literature

BayesAge: A Maximum Likelihood Algorithm To Predict Epigenetic Age

Project description:BayesAge: A Maximum Likelihood Algorithm To Predict Epigenetic Age

| PRJNA1088825 | ENA

On computation of semiparametric maximum likelihood estimators with shape constraints.

Project description:Large sample theory of semiparametric models based on maximum likelihood estimation (MLE) with shape constraint on the nonparametric component is well studied. Relatively less attention has been paid to the computational aspect of semiparametric MLE. The computation of semiparametric MLE based on existing approaches such as the expectation-maximization (EM) algorithm can be computationally prohibitive when the missing rate is high. In this paper, we propose a computational framework for semiparametric MLE based on an inexact block coordinate ascent (BCA) algorithm. We show theoretically that the proposed algorithm converges. This computational framework can be applied to a wide range of data with different structures, such as panel count data, interval-censored data, and degradation data, among others. Simulation studies demonstrate favorable performance compared with existing algorithms in terms of accuracy and speed. Two data sets are used to illustrate the proposed computational method. We further implement the proposed computational method in R package BCA1SG, available at CRAN.

| S-EPMC9619418 | biostudies-literature

Two-stage multi-objective evolutionary algorithm for overlapping community discovery.

Project description:As one of the essential topological structures in complex networks, community structure has significant theoretical and application value and has attracted the attention of researchers in many fields. In a social network, individuals may belong to different communities simultaneously, such as a workgroup and a hobby group. Therefore, overlapping community discovery can help us understand and model the network structure of these multiple relationships more accurately. This article proposes a two-stage multi-objective evolutionary algorithm for overlapping community discovery problem. First, using the initialization method to divide the central node based on node degree, combined with the cross-mutation evolution strategy of the genome matrix, the first stage of non-overlapping community division is completed on the decomposition-based multi-objective optimization framework. Then, based on the result set of the first stage, appropriate nodes are selected from each individual's community as the central node of the initial population in the second stage, and the fuzzy threshold is optimized through the fuzzy clustering method based on evolutionary calculation and the feedback model, to find reasonable overlapping nodes. Finally, tests are conducted on synthetic datasets and real datasets. The statistical results demonstrate that compared with other representative algorithms, this algorithm performs optimally on test instances and has better results.

| S-EPMC11323150 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data