Project description:We introduce the notion of reinforcement quantum annealing (RQA) scheme in which an intelligent agent searches in the space of Hamiltonians and interacts with a quantum annealer that plays the stochastic environment role of learning automata. At each iteration of RQA, after analyzing results (samples) from the previous iteration, the agent adjusts the penalty of unsatisfied constraints and re-casts the given problem to a new Ising Hamiltonian. As a proof-of-concept, we propose a novel approach for casting the problem of Boolean satisfiability (SAT) to Ising Hamiltonians and show how to apply the RQA for increasing the probability of finding the global optimum. Our experimental results on two different benchmark SAT problems (namely factoring pseudo-prime numbers and random SAT with phase transitions), using a D-Wave 2000Q quantum processor, demonstrated that RQA finds notably better solutions with fewer samples, compared to the best-known techniques in the realm of quantum annealing.
Project description:In the real world, many relationships between events are uncertain and probabilistic. Uncertainty is also likely to be a more common feature of daily experience for youth because they have less experience to draw from than adults. Some studies suggest probabilistic learning may be inefficient in youths compared to adults, while others suggest it may be more efficient in youths in mid adolescence. Here we used a probabilistic reinforcement learning task to test how youth age 8-17 (N = 187) and adults age 18-30 (N = 110) learn about stable probabilistic contingencies. Performance increased with age through early-twenties, then stabilized. Using hierarchical Bayesian methods to fit computational reinforcement learning models, we show that all participants' performance was better explained by models in which negative outcomes had minimal to no impact on learning. The performance increase over age was driven by 1) an increase in learning rate (i.e. decrease in integration time scale); 2) a decrease in noisy/exploratory choices. In mid-adolescence age 13-15, salivary testosterone and learning rate were positively related. We discuss our findings in the context of other studies and hypotheses about adolescent brain development.
Project description:Theoretical models of bipolar disorders (BD) posit core deficits in reward system function. However, specifying which among the multiple reward system's neurobehavioral processes are abnormal in BD is necessary to develop appropriately targeted interventions. Research on probabilistic-reinforcement learning deficits in BD is limited, particularly during adolescence, a period of significant neurodevelopmental changes in the reward system. The present study investigated probabilistic-reinforcement learning, using a probabilistic selection task (PST), and its correlates, using self-reported reward/threat sensitivities and cognitive tasks, in 104 adolescents with and without BD. Compared with healthy peers, adolescents with BD were less likely to persist with their choices based on prior positive feedback (i.e., lower win-stay rates) in the PST's acquisition phase. Across groups, a greater win-stay rate appeared to be a more efficient learning strategy-associated with fewer acquisition trials and better testing phase performance. Win-stay rates were also related to verbal learning indices, but not self-reported reward/threat sensitivities. Finally, lower win-stay rates had significant incremental validity in predicting a BD diagnosis, after accounting for effects of current symptoms, reward sensitivities, verbal learning, and IQ. The present findings support multiple dysfunctional processes of the reward system in adolescent BD that require additional examinations. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Project description:BackgroundAnhedonia (a reduced experience of pleasure) and avolition (a reduction in goal-directed activity) are common features of schizophrenia that have substantial effects on functional outcome, but are poorly understood and treated. Here, we examined whether alterations in reinforcement learning may contribute to these symptoms in schizophrenia by impairing the translation of reward information into goal-directed action.Methods38 stable outpatients with schizophrenia or schizoaffective disorder and 37 healthy controls underwent fMRI during a probabilistic stimulus selection reinforcement learning task with dissociated choice- and feedback-related activation, followed by a behavioral transfer task allowing separate assessment of learning from positive versus negative outcomes. A Q-learning algorithm was used to examine functional activation relating to prediction error at the time of feedback and to expected value at the time of choice.ResultsBehavioral results suggested a reduction in learning from positive feedback in patients; however, this reduction was unrelated to anhedonia/avolition severity. On fMRI analysis, prediction error-related activation at the time of feedback was highly similar between patients and controls. During early learning, patients activated regions in the cognitive control network to a lesser extent than controls. Correlation analyses revealed reduced responses to positive feedback in dorsolateral prefrontal cortex and caudate among those patients higher in anhedonia/avolition.ConclusionsTogether, these results suggest that anhedonia/avolition are as strongly related to cortical learning or higher-level processes involved in goal-directed behavior such as effort computation and planning as to striatally mediated learning mechanisms.
Project description:Summary Electroactive Polymer (EAP) hydrogels are an active matter material used as actuators in soft robotics. Hydrogels exhibit active matter behavior through a form of memory and can be used to embody memory systems such as automata. This study exploited EAP responses, finding that EAP memory functions could be utilized for automaton and reservoir computing frameworks. Under sequential electrical stimulation, the mechanical responses of EAPs were represented in a probabilistic Moore automaton framework and expanded through shaping the reservoir’s energy landscape. The EAP automaton reservoir’s computational ability was compared with digital computation to assess EAPs as computational resources. We found that the computation in the EAP’s reaction to stimuli can be presented through automaton structures, revealing a potential bridge between EAP’s use as an integrated actuator and controller, i.e., our automaton framework could potentially lead to control systems wherein the computation was embedded into the media dynamical responses. Graphical abstract Highlights • EAP gel memory mechanics were demonstrated via voltage potential measurements• Probabilistic Moore automata were constructed from EAP gel responses to stimulation• Through tuning response encoding a computational reservoir was created• The reservoir was shown as more memory efficient than general digital alternatives Theoretical physics; Materials science; Polymers
Project description:Schizophrenia spectrum disorders (SZ) are characterized by impairments in probabilistic reinforcement learning (RL), which is associated with dopaminergic circuitry encompassing the prefrontal cortex and basal ganglia. However, there are no studies examining dopaminergic genes with respect to probabilistic RL in SZ. Thus, the aim of our study was to examine the impact of dopaminergic genes on performance assessed by the Probabilistic Selection Task (PST) in patients with SZ in comparison to healthy control (HC) subjects. In our study, we included 138 SZ patients and 188 HC participants. Genetic analysis was performed with respect to the following genetic polymorphisms: rs4680 in COMT, rs907094 in DARP-32, rs2734839, rs936461, rs1800497, and rs6277 in DRD2, rs747302 and rs1800955 in DRD4 and rs28363170 and rs2975226 in DAT1 genes. The probabilistic RL task was completed by 59 SZ patients and 95 HC subjects. SZ patients performed significantly worse in acquiring reinforcement contingencies during the task in comparison to HCs. We found no significant association between genetic polymorphisms and RL among SZ patients; however, among HC participants with respect to the DAT1 rs28363170 polymorphism, individuals with 10-allele repeat genotypes performed better in comparison to 9-allele repeat carriers. The present study indicates the relevance of the DAT1 rs28363170 polymorphism in RL in HC participants.
Project description:We present an algorithm for active learning of deterministic timed automata with a single clock. The algorithm is within the framework of Angluin’s
Project description:Instrumental learning involves corticostriatal circuitry and the dopaminergic system. This system is typically modeled in the reinforcement learning (RL) framework by incrementally accumulating reward values of states and actions. However, human learning also implicates prefrontal cortical mechanisms involved in higher level cognitive functions. The interaction of these systems remains poorly understood, and models of human behavior often ignore working memory (WM) and therefore incorrectly assign behavioral variance to the RL system. Here we designed a task that highlights the profound entanglement of these two processes, even in simple learning problems. By systematically varying the size of the learning problem and delay between stimulus repetitions, we separately extracted WM-specific effects of load and delay on learning. We propose a new computational model that accounts for the dynamic integration of RL and WM processes observed in subjects' behavior. Incorporating capacity-limited WM into the model allowed us to capture behavioral variance that could not be captured in a pure RL framework even if we (implausibly) allowed separate RL systems for each set size. The WM component also allowed for a more reasonable estimation of a single RL process. Finally, we report effects of two genetic polymorphisms having relative specificity for prefrontal and basal ganglia functions. Whereas the COMT gene coding for catechol-O-methyl transferase selectively influenced model estimates of WM capacity, the GPR6 gene coding for G-protein-coupled receptor 6 influenced the RL learning rate. Thus, this study allowed us to specify distinct influences of the high-level and low-level cognitive functions on instrumental learning, beyond the possibilities offered by simple RL models.
Project description:Deep learning is one of the most advanced forms of machine learning. Most modern deep learning models are based on an artificial neural network, and benchmarking studies reveal that neural networks have produced results comparable to and in some cases superior to human experts. However, the generated neural networks are typically regarded as incomprehensible black-box models, which not only limits their applications, but also hinders testing and verifying. In this paper, we present an active learning framework to extract automata from neural network classifiers, which can help users to understand the classifiers. In more detail, we use Angluin's L* algorithm as a learner and the neural network under learning as an oracle, employing abstraction interpretation of the neural network for answering membership and equivalence queries. Our abstraction consists of value, symbol and word abstractions. The factors that may affect the abstraction are also discussed in the paper. We have implemented our approach in a prototype. To evaluate it, we have performed the prototype on a MNIST classifier and have identified that the abstraction with interval number 2 and block size 1 × 28 offers the best performance in terms of F1 score. We also have compared our extracted DFA against the DFAs learned via the passive learning algorithms provided in LearnLib and the experimental results show that our DFA gives a better performance on the MNIST dataset.
Project description:In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings.