Tsedeke Temesgen Habe, Keijo Haataja, Pekka Toivanen
{"title":"无线胶囊内窥镜的精度增强:一种基于变压器的实时视频目标检测新方法。","authors":"Tsedeke Temesgen Habe, Keijo Haataja, Pekka Toivanen","doi":"10.3389/frai.2025.1529814","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Wireless Capsule Endoscopy (WCE) enables non-invasive imaging of the gastrointestinal tract but generates vast video data, making real-time and accurate abnormality detection challenging. Traditional detection methods struggle with uncontrolled illumination, complex textures, and high-speed processing demands.</p><p><strong>Methods: </strong>This study presents a novel approach using Real-Time Detection Transformer (RT-DETR), a transformer-based object detection model, specifically optimized for WCE video analysis. The model captures contextual information between frames and handles variable image conditions. It was evaluated using the Kvasir-Capsule dataset, with performance assessed across three RT-DETR variants: Small (S), Medium (M), and X-Large (X).</p><p><strong>Results: </strong>RT-DETR-X achieved the highest detection precision. RT-DETR-M offered a practical trade-off between accuracy and speed, while RT-DETR-S processed frames at 270 FPS, enabling real-time performance. All three models demonstrated improved detection accuracy and computational efficiency compared to baseline methods.</p><p><strong>Discussion: </strong>The RT-DETR framework significantly enhances precision and real-time performance in gastrointestinal abnormality detection using WCE. Its clinical potential lies in supporting faster and more accurate diagnosis. Future work will focus on further optimization and deployment in endoscopic video analysis systems.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1529814"},"PeriodicalIF":3.0000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12075415/pdf/","citationCount":"0","resultStr":"{\"title\":\"Precision enhancement in wireless capsule endoscopy: a novel transformer-based approach for real-time video object detection.\",\"authors\":\"Tsedeke Temesgen Habe, Keijo Haataja, Pekka Toivanen\",\"doi\":\"10.3389/frai.2025.1529814\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Wireless Capsule Endoscopy (WCE) enables non-invasive imaging of the gastrointestinal tract but generates vast video data, making real-time and accurate abnormality detection challenging. Traditional detection methods struggle with uncontrolled illumination, complex textures, and high-speed processing demands.</p><p><strong>Methods: </strong>This study presents a novel approach using Real-Time Detection Transformer (RT-DETR), a transformer-based object detection model, specifically optimized for WCE video analysis. The model captures contextual information between frames and handles variable image conditions. It was evaluated using the Kvasir-Capsule dataset, with performance assessed across three RT-DETR variants: Small (S), Medium (M), and X-Large (X).</p><p><strong>Results: </strong>RT-DETR-X achieved the highest detection precision. RT-DETR-M offered a practical trade-off between accuracy and speed, while RT-DETR-S processed frames at 270 FPS, enabling real-time performance. All three models demonstrated improved detection accuracy and computational efficiency compared to baseline methods.</p><p><strong>Discussion: </strong>The RT-DETR framework significantly enhances precision and real-time performance in gastrointestinal abnormality detection using WCE. Its clinical potential lies in supporting faster and more accurate diagnosis. Future work will focus on further optimization and deployment in endoscopic video analysis systems.</p>\",\"PeriodicalId\":33315,\"journal\":{\"name\":\"Frontiers in Artificial Intelligence\",\"volume\":\"8 \",\"pages\":\"1529814\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12075415/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/frai.2025.1529814\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2025.1529814","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
背景:无线胶囊内窥镜(Wireless Capsule Endoscopy, WCE)能够对胃肠道进行无创成像,但会产生大量视频数据,这给实时准确的异常检测带来了挑战。传统的检测方法与不受控制的照明、复杂的纹理和高速处理要求作斗争。方法:本研究提出了一种使用实时检测变压器(RT-DETR)的新方法,这是一种基于变压器的目标检测模型,专门针对WCE视频分析进行了优化。该模型捕获帧之间的上下文信息,并处理可变的图像条件。使用Kvasir-Capsule数据集对其进行评估,并对三种RT-DETR变体进行了性能评估:小(S),中(M)和X-大(X)。结果:rt - der - x检测精度最高。RT-DETR-M在精度和速度之间提供了实用的权衡,而RT-DETR-S以270 FPS处理帧,实现实时性能。与基线方法相比,这三种模型都显示出更高的检测精度和计算效率。讨论:RT-DETR框架显著提高了WCE胃肠异常检测的准确性和实时性。它的临床潜力在于支持更快、更准确的诊断。未来的工作将集中在内窥镜视频分析系统的进一步优化和部署上。
Precision enhancement in wireless capsule endoscopy: a novel transformer-based approach for real-time video object detection.
Background: Wireless Capsule Endoscopy (WCE) enables non-invasive imaging of the gastrointestinal tract but generates vast video data, making real-time and accurate abnormality detection challenging. Traditional detection methods struggle with uncontrolled illumination, complex textures, and high-speed processing demands.
Methods: This study presents a novel approach using Real-Time Detection Transformer (RT-DETR), a transformer-based object detection model, specifically optimized for WCE video analysis. The model captures contextual information between frames and handles variable image conditions. It was evaluated using the Kvasir-Capsule dataset, with performance assessed across three RT-DETR variants: Small (S), Medium (M), and X-Large (X).
Results: RT-DETR-X achieved the highest detection precision. RT-DETR-M offered a practical trade-off between accuracy and speed, while RT-DETR-S processed frames at 270 FPS, enabling real-time performance. All three models demonstrated improved detection accuracy and computational efficiency compared to baseline methods.
Discussion: The RT-DETR framework significantly enhances precision and real-time performance in gastrointestinal abnormality detection using WCE. Its clinical potential lies in supporting faster and more accurate diagnosis. Future work will focus on further optimization and deployment in endoscopic video analysis systems.