Loading [MathJax]/extensions/MathMenu.js
Improving Foundation Model for Endoscopy Video Analysis via Representation Learning on Long Sequences | IEEE Journals & Magazine | IEEE Xplore

Improving Foundation Model for Endoscopy Video Analysis via Representation Learning on Long Sequences


Abstract:

Recent advancements in endoscopy video analysis have relied on the utilization of relatively short video clips extracted from longer videos or millions of individual fram...Show More

Abstract:

Recent advancements in endoscopy video analysis have relied on the utilization of relatively short video clips extracted from longer videos or millions of individual frames. However, these approaches tend to neglect the domain-specific characteristics of endoscopy data, which is typically presented as a long stream containing valuable semantic spatial and temporal information. To address this limitation, we propose EndoFM-LV, a foundation model developed under a minute-level pre-training framework upon long endoscopy video sequences. To be specific, we propose a novel masked token modeling scheme within a teacher-student framework for self-supervised video pre-training, which is tailored for learning representations from long video sequences. For pre-training, we construct a large-scale long endoscopy video dataset comprising 6,469 long endoscopic video samples, each longer than 1 minute and totaling over 13 million frames. Our EndoFM-LV is evaluated on four types of endoscopy tasks, namely classification, segmentation, detection, and workflow recognition, serving as the backbone or temporal module. Extensive experimental results demonstrate that our framework outperforms previous state-of-the-art video-based and frame-based approaches by a significant margin, surpassing Endo-FM (5.6% F1, 9.3% Dice, 8.4% F1, and 3.3% accuracy for classification, segmentation, detection, and workflow recognition) and EndoSSL (5.0% F1, 8.1% Dice, 9.3% F1 and 3.1% accuracy for classification, segmentation, detection, and workflow recognition).
Page(s): 1 - 12
Date of Publication: 13 February 2025

ISSN Information:

PubMed ID: 40031835

Funding Agency:


Contact IEEE to Subscribe