Ontology highlight
ABSTRACT: Background
There is a huge number of health-related apps available, and the numbers are growing fast. However, many of them have been developed without any kind of quality control. In an attempt to contribute to the development of high-quality apps and enable existing apps to be assessed, several guides have been developed.Objective
The main aim of this study was to study the interrater reliability of a new guide - the Mobile App Development and Assessment Guide (MAG) - and compare it with one of the most used guides in the field, the Mobile App Rating Scale (MARS). Moreover, we also focused on whether the interrater reliability of the measures is consistent across multiple types of apps and stakeholders.Methods
In order to study the interrater reliability of the MAG and MARS, we evaluated the 4 most downloaded health apps for chronic health conditions in the medical category of IOS and Android devices (ie, App Store and Google Play). A group of 8 reviewers, representative of individuals that would be most knowledgeable and interested in the use and development of health-related apps and including different types of stakeholders such as clinical researchers, engineers, health care professionals, and end users as potential patients, independently evaluated the quality of the apps using the MAG and MARS. We calculated the Krippendorff alpha for every category in the 2 guides, for each type of reviewer and every app, separately and combined, to study the interrater reliability.Results
Only a few categories of the MAG and MARS demonstrated a high interrater reliability. Although the MAG was found to be superior, there was considerable variation in the scores between the different types of reviewers. The categories with the highest interrater reliability in MAG were "Security" (α=0.78) and "Privacy" (α=0.73). In addition, 2 other categories, "Usability" and "Safety," were very close to compliance (health care professionals: α=0.62 and 0.61, respectively). The total interrater reliability of the MAG (ie, for all categories) was 0.45, whereas the total interrater reliability of the MARS was 0.29.Conclusions
This study shows that some categories of MAG have significant interrater reliability. Importantly, the data show that the MAG scores are better than the ones provided by the MARS, which is the most commonly used guide in the area. However, there is great variability in the responses, which seems to be associated with subjective interpretation by the reviewers.
SUBMITTER: Miro J
PROVIDER: S-EPMC8094021 | biostudies-literature |
REPOSITORIES: biostudies-literature