Browse
Submit Data
Databases
API
Help

Dataset Information

0 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

Parallel reinforcement learning for weighted multi-criteria model with adaptive margin.

ABSTRACT: Reinforcement learning (RL) for a linear family of tasks is described in this paper. The key of our discussion is nonlinearity of the optimal solution even if the task family is linear; we cannot obtain the optimal policy using a naive approach. Although an algorithm exists for calculating the equivalent result to Q-learning for each task simultaneously, it presents the problem of explosion of set sizes. We therefore introduce adaptive margins to overcome this difficulty.

SUBMITTER: Hiraoka K

PROVIDER: S-EPMC2645492 | biostudies-other | 2009 Mar

REPOSITORIES: biostudies-other

ACCESS DATA

Json Xml

Similar Datasets

Parallel model-based and model-free reinforcement learning for card sorting performance.

Project description:The Wisconsin Card Sorting Test (WCST) is considered a gold standard for the assessment of cognitive flexibility. On the WCST, repeating a sorting category following negative feedback is typically treated as indicating reduced cognitive flexibility. Therefore such responses are referred to as 'perseveration' errors. Recent research suggests that the propensity for perseveration errors is modulated by response demands: They occur less frequently when their commitment repeats the previously executed response. Here, we propose parallel reinforcement-learning models of card sorting performance, which assume that card sorting performance can be conceptualized as resulting from model-free reinforcement learning at the level of responses that occurs in parallel with model-based reinforcement learning at the categorical level. We compared parallel reinforcement-learning models with purely model-based reinforcement learning, and with the state-of-the-art attentional-updating model. We analyzed data from 375 participants who completed a computerized WCST. Parallel reinforcement-learning models showed best predictive accuracies for the majority of participants. Only parallel reinforcement-learning models accounted for the modulation of perseveration propensity by response demands. In conclusion, parallel reinforcement-learning models provide a new theoretical perspective on card sorting and it offers a suitable framework for discerning individual differences in latent processes that subserve behavioral flexibility.

| S-EPMC7508815 | biostudies-literature

Adaptive model selection in photonic reservoir computing by reinforcement learning.

Project description:Photonic reservoir computing is an emergent technology toward beyond-Neumann computing. Although photonic reservoir computing provides superior performance in environments whose characteristics are coincident with the training datasets for the reservoir, the performance is significantly degraded if these characteristics deviate from the original knowledge used in the training phase. Here, we propose a scheme of adaptive model selection in photonic reservoir computing using reinforcement learning. In this scheme, a temporal waveform is generated by different dynamic source models that change over time. The system autonomously identifies the best source model for the task of time series prediction using photonic reservoir computing and reinforcement learning. We prepare two types of output weights for the source models, and the system adaptively selected the correct model using reinforcement learning, where the prediction errors are associated with rewards. We succeed in adaptive model selection when the source signal is temporally mixed, having originally been generated by two different dynamic system models, as well as when the signal is a mixture from the same model but with different parameter values. This study paves the way for autonomous behavior in photonic artificial intelligence and could lead to new applications in load forecasting and multi-objective control, where frequent environment changes are expected.

| S-EPMC7308406 | biostudies-literature

Multi-agent reinforcement learning with approximate model learning for competitive games.

Project description:We propose a method for learning multi-agent policies to compete against multiple opponents. The method consists of recurrent neural network-based actor-critic networks and deterministic policy gradients that promote cooperation between agents by communication. The learning process does not require access to opponents' parameters or observations because the agents are trained separately from the opponents. The actor networks enable the agents to communicate using forward and backward paths while the critic network helps to train the actors by delivering them gradient signals based on their contribution to the global reward. Moreover, to address nonstationarity due to the evolving of other agents, we propose approximate model learning using auxiliary prediction networks for modeling the state transitions, reward function, and opponent behavior. In the test phase, we use competitive multi-agent environments to demonstrate by comparison the usefulness and superiority of the proposed method in terms of learning efficiency and goal achievements. The comparison results show that the proposed method outperforms the alternatives.

| S-EPMC6739057 | biostudies-literature

Reinforcement learning of adaptive control strategies

Project description: Not available

| S-EPMC11332247 | biostudies-literature

Multicategory Outcome Weighted Margin-based Learning for Estimating Individualized Treatment Rules.

Project description:Due to heterogeneity for many chronic diseases, precise personalized medicine, also known as precision medicine, has drawn increasing attentions in the scientific community. One main goal of precision medicine is to develop the most effective tailored therapy for each individual patient. To that end, one needs to incorporate individual characteristics to detect a proper individual treatment rule (ITR), by which suitable decisions on treatment assignments can be made to optimize patients' clinical outcome. For binary treatment settings, outcome weighted learning (OWL) and several of its variations have been proposed recently to estimate the ITR by optimizing the conditional expected outcome given patients' information. However, for multiple treatment scenarios, it remains unclear how to use OWL effectively. It can be shown that some direct extensions of OWL for multiple treatments, such as one-versus-one and one-versus-rest methods, can yield suboptimal performance. In this paper, we propose a new learning method, named Multicategory Outcome weighted Margin-based Learning (MOML), for estimating ITR with multiple treatments. Our proposed method is very general and covers OWL as a special case. We show Fisher consistency for the estimated ITR, and establish convergence rate properties. Variable selection using the sparse l 1 penalty is also considered. Analysis of simulated examples and a type 2 diabetes mellitus observational study are used to demonstrate competitive performance of the proposed method.

| S-EPMC7731977 | biostudies-literature

Learning agility and adaptive legged locomotion via curricular hindsight reinforcement learning.

Project description:Agile and adaptive maneuvers such as fall recovery, high-speed turning, and sprinting in the wild are challenging for legged systems. We propose a Curricular Hindsight Reinforcement Learning (CHRL) that learns an end-to-end tracking controller that achieves powerful agility and adaptation for the legged robot. The two key components are (i) a novel automatic curriculum strategy on task difficulty and (ii) a Hindsight Experience Replay strategy adapted to legged locomotion tasks. We demonstrated successful agile and adaptive locomotion on a real quadruped robot that performed fall recovery autonomously, coherent trotting, sustained outdoor running speeds up to 3.45 m/s, and a maximum yaw rate of 3.2 rad/s. This system produces adaptive behaviors responding to changing situations and unexpected disturbances on natural terrains like grass and dirt.

| S-EPMC11564515 | biostudies-literature

Asymmetric and adaptive reward coding via normalized reinforcement learning.

Project description:Learning is widely modeled in psychology, neuroscience, and computer science by prediction error-guided reinforcement learning (RL) algorithms. While standard RL assumes linear reward functions, reward-related neural activity is a saturating, nonlinear function of reward; however, the computational and behavioral implications of nonlinear RL are unknown. Here, we show that nonlinear RL incorporating the canonical divisive normalization computation introduces an intrinsic and tunable asymmetry in prediction error coding. At the behavioral level, this asymmetry explains empirical variability in risk preferences typically attributed to asymmetric learning rates. At the neural level, diversity in asymmetries provides a computational mechanism for recently proposed theories of distributional RL, allowing the brain to learn the full probability distribution of future rewards. This behavioral and computational flexibility argues for an incorporation of biologically valid value functions in computational models of learning and decision-making.

| S-EPMC9345478 | biostudies-literature

Adaptively Weighted Large Margin Classifiers.

Project description:Large margin classifiers have been shown to be very useful in many applications. The Support Vector Machine is a canonical example of large margin classifiers. Despite their flexibility and ability in handling high dimensional data, many large margin classifiers have serious drawbacks when the data are noisy, especially when there are outliers in the data. In this paper, we propose a new weighted large margin classification technique. The weights are chosen adaptively with data. The proposed classifiers are shown to be robust to outliers and thus are able to produce more accurate classification results.

| S-EPMC3867158 | biostudies-literature

Deep reinforcement learning for data-driven adaptive scanning in ptychography.

Project description:We present a method that lowers the dose required for an electron ptychographic reconstruction by adaptively scanning the specimen, thereby providing the required spatial information redundancy in the regions of highest importance. The proposed method is built upon a deep learning model that is trained by reinforcement learning, using prior knowledge of the specimen structure from training data sets. We show that using adaptive scanning for electron ptychography outperforms alternative low-dose ptychography experiments in terms of reconstruction resolution and quality.

| S-EPMC10229550 | biostudies-literature

A neural model of hierarchical reinforcement learning.

Project description:We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in the brain. This model incorporates a broad range of biological features that pose challenges to neural RL, such as temporally extended action sequences, continuous environments involving unknown time delays, and noisy/imprecise computations. Most significantly, we expand the model into the realm of hierarchical reinforcement learning (HRL), which divides the RL process into a hierarchy of actions at different levels of abstraction. Here we implement all the major components of HRL in a neural model that captures a variety of known anatomical and physiological properties of the brain. We demonstrate the performance of the model in a range of different environments, in order to emphasize the aim of understanding the brain's general reinforcement learning ability. These results show that the model compares well to previous modelling work and demonstrates improved performance as a result of its hierarchical ability. We also show that the model's behaviour is consistent with available data on human hierarchical RL, and generate several novel predictions.

| S-EPMC5500327 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data