Dataset Information

A Hybrid PAC Reinforcement Learning Algorithm for Human-Robot Interaction.

ABSTRACT: This paper offers a new hybrid probably approximately correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) that intelligently maintains favorable features of both model-based and model-free methodologies. The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free Delayed Q-learning and model-based R-max algorithms while outperforming both in most cases. The paper includes a PAC analysis of the DDQ algorithm and a derivation of its sample complexity. Numerical results are provided to support the claim regarding the new algorithm's sample efficiency compared to its parents as well as the best known PAC model-free and model-based algorithms in application. A real-world experimental implementation of DDQ in the context of pediatric motor rehabilitation facilitated by infant-robot interaction highlights the potential benefits of the reported method.

SUBMITTER: Zehfroosh A

PROVIDER: S-EPMC8982074 | biostudies-literature | 2022

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A Hybrid PAC Reinforcement Learning Algorithm for Human-Robot Interaction.

Zehfroosh Ashkan A Tanner Herbert G HG

Frontiers in robotics and AI 20220309

This paper offers a new hybrid probably approximately correct (PAC) reinforcement learning (RL) algorithm for Markov decision processes (MDPs) that intelligently maintains favorable features of both model-based and model-free methodologies. The designed algorithm, referred to as the Dyna-Delayed Q-learning (DDQ) algorithm, combines model-free Delayed Q-learning and model-based R-max algorithms while outperforming both in most cases. The paper includes a PAC analysis of the DDQ algorithm and a de ...[more]

PMID: 35391942

Similar Datasets

Project description:BackgroundRehabilitation robotics is progressing towards developing robots that can be used as advanced tools to augment the role of a therapist. These robots are capable of not only offering more frequent and more accessible therapies but also providing new insights into treatment effectiveness based on their ability to measure interaction parameters. A requirement for having more advanced therapies is to identify how robots can 'adapt' to each individual's needs at different stages of recovery. Hence, our research focused on developing an adaptive interface for the GENTLE/A rehabilitation system. The interface was based on a lead-lag performance model utilising the interaction between the human and the robot. The goal of the present study was to test the adaptability of the GENTLE/A system to the performance of the user.MethodsPoint-to-point movements were executed using the HapticMaster (HM) robotic arm, the main component of the GENTLE/A rehabilitation system. The points were displayed as balls on the screen and some of the points also had a real object, providing a test-bed for the human-robot interaction (HRI) experiment. The HM was operated in various modes to test the adaptability of the GENTLE/A system based on the leading/lagging performance of the user. Thirty-two healthy participants took part in the experiment comprising of a training phase followed by the actual-performance phase.ResultsThe leading or lagging role of the participant could be used successfully to adjust the duration required by that participant to execute point-to-point movements, in various modes of robot operation and under various conditions. The adaptability of the GENTLE/A system was clearly evident from the durations recorded. The regression results showed that the participants required lower execution times with the help from a real object when compared to just a virtual object. The 'reaching away' movements were longer to execute when compared to the 'returning towards' movements irrespective of the influence of the gravity on the direction of the movement.ConclusionsThe GENTLE/A system was able to adapt so that the duration required to execute point-to-point movement was according to the leading or lagging performance of the user with respect to the robot. This adaptability could be useful in the clinical settings when stroke subjects interact with the system and could also serve as an assessment parameter across various interaction sessions. As the system adapts to user input, and as the task becomes easier through practice, the robot would auto-tune for more demanding and challenging interactions. The improvement in performance of the participants in an embedded environment when compared to a virtual environment also shows promise for clinical applicability, to be tested in due time. Studying the physiology of upper arm to understand the muscle groups involved, and their influence on various movements executed during this study forms a key part of our future work.

Dataset Information

A Hybrid PAC Reinforcement Learning Algorithm for Human-Robot Interaction.

Publications

A Hybrid PAC Reinforcement Learning Algorithm for Human-Robot Interaction.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets