Dataset Information

Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review.

ABSTRACT:

Background

Dialog agents (chatbots) have a long history of application in health care, where they have been used for tasks such as supporting patient self-management and providing counseling. Their use is expected to grow with increasing demands on health systems and improving artificial intelligence (AI) capability. Approaches to the evaluation of health care chatbots, however, appear to be diverse and haphazard, resulting in a potential barrier to the advancement of the field.

Objective

This study aims to identify the technical (nonclinical) metrics used by previous studies to evaluate health care chatbots.

Methods

Studies were identified by searching 7 bibliographic databases (eg, MEDLINE and PsycINFO) in addition to conducting backward and forward reference list checking of the included studies and relevant reviews. The studies were independently selected by two reviewers who then extracted data from the included studies. Extracted data were synthesized narratively by grouping the identified metrics into categories based on the aspect of chatbots that the metrics evaluated.

Results

Of the 1498 citations retrieved, 65 studies were included in this review. Chatbots were evaluated using 27 technical metrics, which were related to chatbots as a whole (eg, usability, classifier performance, speed), response generation (eg, comprehensibility, realism, repetitiveness), response understanding (eg, chatbot understanding as assessed by users, word error rate, concept error rate), and esthetics (eg, appearance of the virtual agent, background color, and content).

Conclusions

The technical metrics of health chatbot studies were diverse, with survey designs and global usability metrics dominating. The lack of standardization and paucity of objective measures make it difficult to compare the performance of health chatbots and could inhibit advancement of the field. We suggest that researchers more frequently include metrics computed from conversation logs. In addition, we recommend the development of a framework of technical metrics with recommendations for specific circumstances for their inclusion in chatbot studies.

SUBMITTER: Abd-Alrazaq A

PROVIDER: S-EPMC7305563 | biostudies-literature | 2020 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review.

Abd-Alrazaq Alaa A Safi Zeineb Z Alajlani Mohannad M Warren Jim J Househ Mowafa M Denecke Kerstin K

Journal of medical Internet research 20200605 6

<h4>Background</h4>Dialog agents (chatbots) have a long history of application in health care, where they have been used for tasks such as supporting patient self-management and providing counseling. Their use is expected to grow with increasing demands on health systems and improving artificial intelligence (AI) capability. Approaches to the evaluation of health care chatbots, however, appear to be diverse and haphazard, resulting in a potential barrier to the advancement of the field.<h4>Objec ...[more]

PMID: 32442157

Dataset Information

Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review.

Background

Objective

Methods

Results

Conclusions

Publications

Technical Metrics Used to Evaluate Health Care Chatbots: Scoping Review.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Technical Aspects of Developing Chatbots for Medical Applications: Scoping Review.
| S-EPMC7775817 | biostudies-literature

Health Chatbots in Africa: Scoping Review.
| S-EPMC10337242 | biostudies-literature

One Health timeliness metrics to track and evaluate outbreak response reporting: A scoping review.
| S-EPMC9463558 | biostudies-literature

AI Chatbots for Psychological Health for Health Professionals: Scoping Review.
| S-EPMC11939020 | biostudies-literature

Reporting of Fairness Metrics in Clinical Risk Prediction Models Used for Precision Health: Scoping Review.
| S-EPMC11966066 | biostudies-literature

Methods and Measures Used to Evaluate Patient-Operated Mobile Health Interventions: Scoping Literature Review.
| S-EPMC7226051 | biostudies-literature

Perceptions and Opinions of Patients About Mental Health Chatbots: Scoping Review.
| S-EPMC7840290 | biostudies-literature

The Development and Use of Chatbots in Public Health: Scoping Review.
| S-EPMC9536768 | biostudies-literature

Postdischarge Outcome Domains in Pediatric Critical Care and the Instruments Used to Evaluate Them: A Scoping Review.
| S-EPMC7708523 | biostudies-literature

Chatbots for Smoking Cessation: Scoping Review.
| S-EPMC9514452 | biostudies-literature