Dataset Information

Self-supervised representation learning for surgical activity recognition.

ABSTRACT:

Purpose

Virtual reality-based simulators have the potential to become an essential part of surgical education. To make full use of this potential, they must be able to automatically recognize activities performed by users and assess those. Since annotations of trajectories by human experts are expensive, there is a need for methods that can learn to recognize surgical activities in a data-efficient way.

Methods

We use self-supervised training of deep encoder-decoder architectures to learn representations of surgical trajectories from video data. These representations allow for semi-automatic extraction of features that capture information about semantically important events in the trajectories. Such features are processed as inputs of an unsupervised surgical activity recognition pipeline.

Results

Our experiments document that the performance of hidden semi-Markov models used for recognizing activities in a simulated myomectomy scenario benefits from using features extracted from representations learned while training a deep encoder-decoder network on the task of predicting the remaining surgery progress.

Conclusion

Our work is an important first step in the direction of making efficient use of features obtained from deep representation learning for surgical activity recognition in settings where only a small fraction of the existing data is annotated by human domain experts and where those annotations are potentially incomplete.

SUBMITTER: Paysan D

PROVIDER: S-EPMC8589823 | biostudies-literature | 2021 Nov

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Self-supervised representation learning for surgical activity recognition.

Paysan Daniel D Haug Luis L Bajka Michael M Oelhafen Markus M Buhmann Joachim M JM

International journal of computer assisted radiology and surgery 20210920 11

<h4>Purpose</h4>Virtual reality-based simulators have the potential to become an essential part of surgical education. To make full use of this potential, they must be able to automatically recognize activities performed by users and assess those. Since annotations of trajectories by human experts are expensive, there is a need for methods that can learn to recognize surgical activities in a data-efficient way.<h4>Methods</h4>We use self-supervised training of deep encoder-decoder architectures ...[more]

PMID: 34542839

Similar Datasets

Project description:Colorectal cancer is a leading cause of cancer-related deaths globally. In recent years, the use of convolutional neural networks in computer-aided diagnosis (CAD) has facilitated simpler detection of early lesions like polyps during real-time colonoscopy. However, the majority of existing techniques require a large training dataset annotated by experienced experts. To alleviate the laborious task of image annotation and utilize the vast amounts of readily available unlabeled colonoscopy data to further improve the polyp detection ability, this study proposed a novel self-supervised representation learning method called feature pyramid siamese networks (FPSiam). First, a feature pyramid encoder module was proposed to effectively extract and fuse both local and global feature representations among colonoscopic images, which is important for dense prediction tasks like polyp detection. Next, a self-supervised visual feature representation containing the general feature of colonoscopic images is learned by the siamese networks. Finally, the feature representation will be transferred to the downstream colorectal polyp detection task. A total of 103 videos (861,400 frames), 100 videos (24,789 frames), and 60 videos (15,397 frames) in the LDPolypVideo dataset are used to pre-train, train, and test the performance of the proposed FPSiam and its counterparts, respectively. The experimental results have illustrated that our FPSiam approach obtains the optimal capability, which is better than that of other state-of-the-art self-supervised learning methods and is also higher than the method based on transfer learning by 2.3 mAP and 3.6 mAP for two typical detectors. In conclusion, FPSiam provides a cost-efficient solution for developing colorectal polyp detection systems, especially in conditions where only a small fraction of the dataset is labeled while the majority remains unlabeled. Besides, it also brings fresh perspectives into other endoscopic image analysis tasks.

Dataset Information

Self-supervised representation learning for surgical activity recognition.

Purpose

Methods

Results

Conclusion

Publications

Self-supervised representation learning for surgical activity recognition.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets