Dataset Information

A greedy stacking algorithm for model ensembling and domain weighting.

ABSTRACT: OBJECTIVE:Because it is impossible to know which statistical learning algorithm performs best on a prediction task, it is common to use stacking methods to ensemble individual learners into a more powerful single learner. Stacking algorithms are usually based on linear models, which may run into problems, especially when predictions are highly correlated. In this study, we develop a greedy algorithm for model stacking that overcomes this issue while still being very fast and easy to interpret. We evaluate our greedy algorithm on 7 different data sets from various biomedical disciplines and compare it to linear stacking, genetic algorithm stacking and a brute force approach in different prediction settings. We further apply this algorithm on a task to optimize the weighting of the single domains (e.g., income, education) that build the German Index of Multiple Deprivation (GIMD) to be highly correlated with mortality. RESULTS:The greedy stacking algorithm provides good ensemble weights and outperforms the linear stacker in many tasks. Still, the brute force approach is slightly superior, but is computationally expensive. The greedy weighting algorithm has a variety of possible applications and is fast and efficient. A python implementation is provided.

SUBMITTER: Kurz CF

PROVIDER: S-EPMC7017540 | biostudies-literature | 2020 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A greedy stacking algorithm for model ensembling and domain weighting.

Kurz Christoph F CF Maier Werner W Rink Christian C

BMC research notes 20200212 1

<h4>Objective</h4>Because it is impossible to know which statistical learning algorithm performs best on a prediction task, it is common to use stacking methods to ensemble individual learners into a more powerful single learner. Stacking algorithms are usually based on linear models, which may run into problems, especially when predictions are highly correlated. In this study, we develop a greedy algorithm for model stacking that overcomes this issue while still being very fast and easy to inte ...[more]

PMID: 32051022

Similar Datasets

Project description:BackgroundCophylogeny mapping is used to uncover deep coevolutionary associations between two or more phylogenetic histories at a macro coevolutionary scale. As cophylogeny mapping is NP-Hard, this technique relies heavily on heuristics to solve all but the most trivial cases. One notable approach utilises a metaheuristic to search only a subset of the exponential number of fixed node orderings possible for the phylogenetic histories in question. This is of particular interest as it is the only known heuristic that guarantees biologically feasible solutions. This has enabled research to focus on larger coevolutionary systems, such as coevolutionary associations between figs and their pollinator wasps, including over 200 taxa. Although able to converge on solutions for problem instances of this size, a reduction from the current cubic running time is required to handle larger systems, such as Wolbachia and their insect hosts.ResultsRather than solving this underlying problem optimally this work presents a greedy algorithm called TreeCollapse, which uses common topological patterns to recover an approximation of the coevolutionary history where the internal node ordering is fixed. This approach offers a significant speed-up compared to previous methods, running in linear time. This algorithm has been applied to over 100 well-known coevolutionary systems converging on Pareto optimal solutions in over 68% of test cases, even where in some cases the Pareto optimal solution has not previously been recoverable. Further, while TreeCollapse applies a local search technique, it can guarantee solutions are biologically feasible, making this the fastest method that can provide such a guarantee.ConclusionAs a result, we argue that the newly proposed algorithm is a valuable addition to the field of coevolutionary research. Not only does it offer a significantly faster method to estimate the cost of cophylogeny mappings but by using this approach, in conjunction with existing heuristics, it can assist in recovering a larger subset of the Pareto front than has previously been possible.

Project description:MotivationThe concept of controllability within complex networks is pivotal in determining the minimal set of driver vertices required for the exertion of external signals, thereby enabling control over the entire network's vertices. Target controllability further refines this concept by focusing on a subset of vertices within the network as the specific targets for control, both of which are known to be NP-hard problems. Crucially, the effectiveness of the driver set in achieving control of the network is contingent upon satisfying a specific rank condition, as introduced by Kalman. On the other hand, structural controllability provides a complementary approach to understanding network control, emphasizing the identification of driver vertices based on the network's structural properties. However, in structural controllability approaches, the Kalman condition may not always be satisfied.ResultsIn this study, we address the challenge of target controllability by proposing a feed-forward greedy algorithm designed to efficiently handle large networks while meeting the Kalman controllability rank condition. We further enhance our method's efficacy by integrating it with Barabasi et al.'s structural controllability approach. This integration allows for a more comprehensive control strategy, leveraging both the dynamical requirements specified by Kalman's rank condition and the structural properties of the network. Empirical evaluation across various network topologies demonstrates the superior performance of our algorithms compared to existing methods, consistently requiring fewer driver vertices for effective control. Additionally, our method's application to protein-protein interaction networks associated with breast cancer reveals potential drug repurposing candidates, underscoring its biomedical relevance. This study highlights the importance of addressing both structural and dynamical aspects of network controllability for advancing control strategies in complex systems.Availability and implementationThe source code is available for free at:Https://github.com/fatemeKhezry/targetControllability.

Dataset Information

A greedy stacking algorithm for model ensembling and domain weighting.

Publications

A greedy stacking algorithm for model ensembling and domain weighting.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets