Dataset Information

Development of a Method for Clinical Evaluation of Artificial Intelligence-Based Digital Wound Assessment Tools.

ABSTRACT:

Importance

Accurate assessment of wound area and percentage of granulation tissue (PGT) are important for optimizing wound care and healing outcomes. Artificial intelligence (AI)-based wound assessment tools have the potential to improve the accuracy and consistency of wound area and PGT measurement, while improving efficiency of wound care workflows.

Objective

To develop a quantitative and qualitative method to evaluate AI-based wound assessment tools compared with expert human assessments.

Design, setting, and participants

This diagnostic study was performed across 2 independent wound centers using deidentified wound photographs collected for routine care (site 1, 110 photographs taken between May 1 and 31, 2018; site 2, 89 photographs taken between January 1 and December 31, 2019). Digital wound photographs of patients were selected chronologically from the electronic medical records from the general population of patients visiting the wound centers. For inclusion in the study, the complete wound edge and a ruler were required to be visible; circumferential ulcers were specifically excluded. Four wound specialists (2 per site) and an AI-based wound assessment service independently traced wound area and granulation tissue.

Main outcomes and measures

The quantitative performance of AI tracings was evaluated by statistically comparing error measure distributions between test AI traces and reference human traces (AI vs human) with error distributions between independent traces by 2 humans (human vs human). Quantitative outcomes included statistically significant differences in error measures of false-negative area (FNA), false-positive area (FPA), and absolute relative error (ARE) between AI vs human and human vs human comparisons of wound area and granulation tissue tracings. Six masked attending physician reviewers (3 per site) viewed randomized area tracings for AI and human annotators and qualitatively assessed them. Qualitative outcomes included statistically significant difference in the absolute difference between AI-based PGT measurements and mean reviewer visual PGT estimates compared with PGT estimate variability measures (ie, range, standard deviation) across reviewers.

Results

A total of 199 photographs were selected for the study across both sites; mean (SD) patient age was 64 (18) years (range, 17-95 years) and 127 (63.8%) were women. The comparisons of AI vs human with human vs human for FPA and ARE were not statistically significant. AI vs human FNA was slightly elevated compared with human vs human FNA (median [IQR], 7.7% [2.7%-21.2%] vs 5.7% [1.6%-14.9%]; P < .001), indicating that AI traces tended to slightly underestimate the human reference wound boundaries compared with human test traces. Two of 6 reviewers had a statistically higher frequency in agreement that human tracings met the standard area definition, but overall agreement was moderate (352 yes responses of 583 total responses [60.4%] for AI and 793 yes responses of 1166 total responses [68.0%] for human tracings). AI PGT measurements fell in the typical range of variation in interreviewer visual PGT estimates; however, visual PGT estimates varied considerably (mean range, 34.8%; mean SD, 19.6%).

Conclusions and relevance

This study provides a framework for evaluating AI-based digital wound assessment tools that can be extended to automated measurements of other wound features or adapted to evaluate other AI-based digital image diagnostic tools. As AI-based wound assessment tools become more common across wound care settings, it will be important to rigorously validate their performance in helping clinicians obtain accurate wound assessments to guide clinical care.

SUBMITTER: Howell RS

PROVIDER: S-EPMC8134996 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:ImportanceComplications that arise from phacoemulsification procedures can lead to worse visual outcomes. Real-time image processing with artificial intelligence tools can extract data to deliver surgical guidance, potentially enhancing the surgical environment.ObjectiveTo evaluate the ability of a deep neural network to track the pupil, identify the surgical phase, and activate specific computer vision tools to aid the surgeon during phacoemulsification cataract surgery by providing visual feedback in real time.Design, setting, and participantsThis cross-sectional study evaluated deidentified surgical videos of phacoemulsification cataract operations performed by faculty and trainee surgeons in a university-based ophthalmology department between July 1, 2020, and January 1, 2021, in a population-based cohort of patients.ExposuresA region-based convolutional neural network was used to receive frames from the video source and, in real time, locate the pupil and in parallel identify the surgical phase being performed. Computer vision-based algorithms were applied according to the phase identified, providing visual feedback to the surgeon.Main outcomes and measuresOutcomes were area under the receiver operator characteristic curve and area under the precision-recall curve for surgical phase classification and Dice score (harmonic mean of the precision and recall [sensitivity]) for detection of the pupil boundary. Network performance was assessed as video output in frames per second. A usability survey was administered to volunteer cataract surgeons previously unfamiliar with the platform.ResultsThe region-based convolutional neural network model achieved area under the receiver operating characteristic curve values of 0.996 for capsulorhexis, 0.972 for phacoemulsification, 0.997 for cortex removal, and 0.880 for idle phase recognition. The final algorithm reached a Dice score of 90.23% for pupil segmentation and a mean (SD) processing speed of 97 (34) frames per second. Among the 11 cataract surgeons surveyed, 8 (72%) were mostly or extremely likely to use the current platform during surgery for complex cataract.Conclusions and relevanceA computer vision approach using deep neural networks was able to pupil track, identify the surgical phase being executed, and activate surgical guidance tools. These results suggest that an artificial intelligence-based surgical guidance platform has the potential to enhance the surgeon experience in phacoemulsification cataract surgery. This proof-of-concept investigation suggests that a pipeline from a surgical microscope could be integrated with neural networks and computer vision tools to provide surgical guidance in real time.

Project description:ImportanceColorectal polyps are common, and their histopathologic classification is used in the planning of follow-up surveillance. Substantial variation has been observed in pathologists' classification of colorectal polyps, and improved assessment by pathologists may be associated with reduced subsequent underuse and overuse of colonoscopy.ObjectiveTo compare standard microscopic assessment with an artificial intelligence (AI)-augmented digital system that annotates regions of interest within digitized polyp tissue and predicts polyp type using a deep learning model to assist pathologists in colorectal polyp classification.Design, setting, and participantsIn this diagnostic study conducted at a tertiary academic medical center and a community hospital in New Hampshire, 100 slides with colorectal polyp samples were read by 15 pathologists using a microscope and an AI-augmented digital system, with a washout period of at least 12 weeks between use of each modality. The study was conducted from February 10 to July 10, 2020.Main outcomes and measuresAccuracy and time of evaluation were used to compare pathologists' performance when a microscope was used with their performance when the AI-augmented digital system was used. Outcomes were compared using paired t tests and mixed-effects models.ResultsIn assessments of 100 slides with colorectal polyp specimens, use of the AI-augmented digital system significantly improved pathologists' classification accuracy compared with microscopic assessment from 73.9% (95% CI, 71.7%-76.2%) to 80.8% (95% CI, 78.8%-82.8%) (P < .001). The overall difference in the evaluation time per slide between the digital system (mean, 21.7 seconds; 95% CI, 20.8-22.7 seconds) and microscopic examination (mean, 13.0 seconds; 95% CI, 12.4-13.5 seconds) was -8.8 seconds (95% CI, -9.8 to -7.7 seconds), but this difference decreased as pathologists became more familiar and experienced with the digital system; the difference between the time of evaluation on the last set of 20 slides for all pathologists when using the microscope and the digital system was 4.8 seconds (95% CI, 3.0-6.5 seconds).Conclusions and relevanceIn this diagnostic study, an AI-augmented digital system significantly improved the accuracy of pathologic interpretation of colorectal polyps compared with microscopic assessment. If applied broadly to clinical practice, this tool may be associated with decreases in subsequent overuse and underuse of colonoscopy and thus with improved patient outcomes and reduced health care costs.

Project description:ImportanceMammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives.ObjectiveTo evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms.Design, setting, and participantsIn this diagnostic accuracy study conducted between September 2016 and November 2017, an international, crowdsourced challenge was hosted to foster AI algorithm development focused on interpreting screening mammography. More than 1100 participants comprising 126 teams from 44 countries participated. Analysis began November 18, 2016.Main outcomes and measurementsAlgorithms used images alone (challenge 1) or combined images, previous examinations (if available), and clinical and demographic risk factor data (challenge 2) and output a score that translated to cancer yes/no within 12 months. Algorithm accuracy for breast cancer detection was evaluated using area under the curve and algorithm specificity compared with radiologists' specificity with radiologists' sensitivity set at 85.9% (United States) and 83.9% (Sweden). An ensemble method aggregating top-performing AI algorithms and radiologists' recall assessment was developed and evaluated.ResultsOverall, 144 231 screening mammograms from 85 580 US women (952 cancer positive ≤12 months from screening) were used for algorithm training and validation. A second independent validation cohort included 166 578 examinations from 68 008 Swedish women (780 cancer positive). The top-performing algorithm achieved an area under the curve of 0.858 (United States) and 0.903 (Sweden) and 66.2% (United States) and 81.2% (Sweden) specificity at the radiologists' sensitivity, lower than community-practice radiologists' specificity of 90.5% (United States) and 98.5% (Sweden). Combining top-performing algorithms and US radiologist assessments resulted in a higher area under the curve of 0.942 and achieved a significantly improved specificity (92.0%) at the same sensitivity.Conclusions and relevanceWhile no single AI algorithm outperformed radiologists, an ensemble of AI algorithms combined with radiologist assessment in a single-reader screening environment improved overall accuracy. This study underscores the potential of using machine learning methods for enhancing mammography screening interpretation.