基于更快R-CNN和LinkNet的视频胶囊内窥镜自动出血和非出血帧分析的实时、多任务移动应用

IF 2.5 4区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

International Journal of Imaging Systems and Technology Pub Date : 2025-07-22 DOI:10.1002/ima.70171

Divyansh Nautiyal, Manas Dhir, Tanisha Singh, Anushka Saini, Palak Handa

{"title":"基于更快R-CNN和LinkNet的视频胶囊内窥镜自动出血和非出血帧分析的实时、多任务移动应用","authors":"Divyansh Nautiyal, Manas Dhir, Tanisha Singh, Anushka Saini, Palak Handa","doi":"10.1002/ima.70171","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Real-time, multi-task mobile application for automatic bleeding and non-bleeding frame analysis in video capsule endoscopy (VCE) frames is critical for early diagnosis but is currently underexplored. This study presents a mobile application using Flutter that can automatically classify VCE frames as bleeding and non-bleeding, and further identify and segment bleeding areas in real time. The application utilizes an ensemble deep learning model that integrates Faster Region-based Convolutional Neural Network (R-CNN) for frame-level classification and LinkNet for pixel-level segmentation. Faster R-CNN first detects and classifies VCE frames as bleeding or non-bleeding, and subsequently, LinkNet segments the bleeding regions within the frames identified as bleeding. Both models were trained and validated using the publicly available WCEBleedGen dataset. To evaluate the effectiveness of the proposed ensemble, a comparative analysis was conducted with existing studies and state-of-the-art (SOTA) models in the field. For detection, the performance of Faster R-CNN was compared with two You Only Look Once (YOLO) variants: YOLOv5 and YOLOv12. For segmentation, LinkNet was compared with SegNet and UNet. Evaluation metrics included mean Average Precision at 0.5 ([email protected]), Dice coefficient, and Eigen class activation maps. The mobile application achieved an average inference time of 2.88 s per frame and 23.33 s for a batch of 10 frames. Overall, the ensemble model attained a [email protected] of 0.92 and a Dice coefficient of 0.96, outperforming existing studies. For SOTA models, Faster R-CNN outperformed YOLO variants by achieving a 25% higher [email protected], and LinkNet achieved a 26% higher Dice coefficient than SegNet and 5% higher than UNet on the validation dataset and achieved more focused Eigen maps for different bleeding areas. This study represents the first attempt to develop a real-time, multi-task mobile application for VCE bleeding analysis. The application is open-source and freely available at https://github.com/misahub2023/VCE-BleedGen-Application, supporting accessibility, reproducibility, and future research in this field.</p>\n </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 4","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Real-Time, Multi-Task Mobile Application for Automatic Bleeding and Non-Bleeding Frame Analysis in Video Capsule Endoscopy Using an Ensemble of Faster R-CNN and LinkNet\",\"authors\":\"Divyansh Nautiyal, Manas Dhir, Tanisha Singh, Anushka Saini, Palak Handa\",\"doi\":\"10.1002/ima.70171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Real-time, multi-task mobile application for automatic bleeding and non-bleeding frame analysis in video capsule endoscopy (VCE) frames is critical for early diagnosis but is currently underexplored. This study presents a mobile application using Flutter that can automatically classify VCE frames as bleeding and non-bleeding, and further identify and segment bleeding areas in real time. The application utilizes an ensemble deep learning model that integrates Faster Region-based Convolutional Neural Network (R-CNN) for frame-level classification and LinkNet for pixel-level segmentation. Faster R-CNN first detects and classifies VCE frames as bleeding or non-bleeding, and subsequently, LinkNet segments the bleeding regions within the frames identified as bleeding. Both models were trained and validated using the publicly available WCEBleedGen dataset. To evaluate the effectiveness of the proposed ensemble, a comparative analysis was conducted with existing studies and state-of-the-art (SOTA) models in the field. For detection, the performance of Faster R-CNN was compared with two You Only Look Once (YOLO) variants: YOLOv5 and YOLOv12. For segmentation, LinkNet was compared with SegNet and UNet. Evaluation metrics included mean Average Precision at 0.5 ([email protected]), Dice coefficient, and Eigen class activation maps. The mobile application achieved an average inference time of 2.88 s per frame and 23.33 s for a batch of 10 frames. Overall, the ensemble model attained a [email protected] of 0.92 and a Dice coefficient of 0.96, outperforming existing studies. For SOTA models, Faster R-CNN outperformed YOLO variants by achieving a 25% higher [email protected], and LinkNet achieved a 26% higher Dice coefficient than SegNet and 5% higher than UNet on the validation dataset and achieved more focused Eigen maps for different bleeding areas. This study represents the first attempt to develop a real-time, multi-task mobile application for VCE bleeding analysis. The application is open-source and freely available at https://github.com/misahub2023/VCE-BleedGen-Application, supporting accessibility, reproducibility, and future research in this field.</p>\\n </div>\",\"PeriodicalId\":14027,\"journal\":{\"name\":\"International Journal of Imaging Systems and Technology\",\"volume\":\"35 4\",\"pages\":\"\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Imaging Systems and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ima.70171\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Imaging Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ima.70171","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

视频胶囊内窥镜（VCE）帧中自动出血和非出血帧分析的实时、多任务移动应用程序对早期诊断至关重要，但目前尚未得到充分探索。本研究提出了一个使用Flutter的移动应用程序，可以自动将VCE帧划分为出血和非出血，并进一步实时识别和分割出血区域。该应用程序利用集成深度学习模型，该模型集成了用于帧级分类的Faster基于区域的卷积神经网络（R-CNN）和用于像素级分割的LinkNet。更快的R-CNN首先检测并将VCE帧分类为出血或非出血，随后，LinkNet将帧内的出血区域分割为出血区域。两个模型都使用公开可用的WCEBleedGen数据集进行训练和验证。为了评估所提出的集成的有效性，与该领域的现有研究和最先进的（SOTA）模型进行了比较分析。为了检测，将Faster R-CNN的性能与两种YOLO （You Only Look Once）变体：YOLOv5和YOLOv12进行了比较。在分段方面，将LinkNet与SegNet和UNet进行了比较。评估指标包括0.5的平均精度、骰子系数和特征类激活地图。移动应用程序实现了每帧2.88秒的平均推理时间和一批10帧23.33秒的平均推理时间。总体而言，集成模型获得了0.92的[email protected]和0.96的Dice系数，优于现有的研究。对于SOTA模型，更快的R-CNN优于YOLO变体，实现了25%的高[email protected]， LinkNet在验证数据集中实现了比SegNet高26%的Dice系数和比UNet高5%的Dice系数，并为不同的出血区域实现了更集中的特征图。这项研究首次尝试开发一种实时、多任务的VCE出血分析移动应用程序。该应用程序是开源的，可在https://github.com/misahub2023/VCE-BleedGen-Application免费获得，支持可访问性、可再现性和该领域的未来研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Real-Time, Multi-Task Mobile Application for Automatic Bleeding and Non-Bleeding Frame Analysis in Video Capsule Endoscopy Using an Ensemble of Faster R-CNN and LinkNet

Real-time, multi-task mobile application for automatic bleeding and non-bleeding frame analysis in video capsule endoscopy (VCE) frames is critical for early diagnosis but is currently underexplored. This study presents a mobile application using Flutter that can automatically classify VCE frames as bleeding and non-bleeding, and further identify and segment bleeding areas in real time. The application utilizes an ensemble deep learning model that integrates Faster Region-based Convolutional Neural Network (R-CNN) for frame-level classification and LinkNet for pixel-level segmentation. Faster R-CNN first detects and classifies VCE frames as bleeding or non-bleeding, and subsequently, LinkNet segments the bleeding regions within the frames identified as bleeding. Both models were trained and validated using the publicly available WCEBleedGen dataset. To evaluate the effectiveness of the proposed ensemble, a comparative analysis was conducted with existing studies and state-of-the-art (SOTA) models in the field. For detection, the performance of Faster R-CNN was compared with two You Only Look Once (YOLO) variants: YOLOv5 and YOLOv12. For segmentation, LinkNet was compared with SegNet and UNet. Evaluation metrics included mean Average Precision at 0.5 ([email protected]), Dice coefficient, and Eigen class activation maps. The mobile application achieved an average inference time of 2.88 s per frame and 23.33 s for a batch of 10 frames. Overall, the ensemble model attained a [email protected] of 0.92 and a Dice coefficient of 0.96, outperforming existing studies. For SOTA models, Faster R-CNN outperformed YOLO variants by achieving a 25% higher [email protected], and LinkNet achieved a 26% higher Dice coefficient than SegNet and 5% higher than UNet on the validation dataset and achieved more focused Eigen maps for different bleeding areas. This study represents the first attempt to develop a real-time, multi-task mobile application for VCE bleeding analysis. The application is open-source and freely available at https://github.com/misahub2023/VCE-BleedGen-Application, supporting accessibility, reproducibility, and future research in this field.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Imaging Systems and Technology 工程技术-成像科学与照相技术

CiteScore

6.90

自引率

6.10%

发文量

138

审稿时长

3 months

期刊介绍： The International Journal of Imaging Systems and Technology (IMA) is a forum for the exchange of ideas and results relevant to imaging systems, including imaging physics and informatics. The journal covers all imaging modalities in humans and animals. IMA accepts technically sound and scientifically rigorous research in the interdisciplinary field of imaging, including relevant algorithmic research and hardware and software development, and their applications relevant to medical research. The journal provides a platform to publish original research in structural and functional imaging. The journal is also open to imaging studies of the human body and on animals that describe novel diagnostic imaging and analyses methods. Technical, theoretical, and clinical research in both normal and clinical populations is encouraged. Submissions describing methods, software, databases, replication studies as well as negative results are also considered. The scope of the journal includes, but is not limited to, the following in the context of biomedical research: Imaging and neuro-imaging modalities: structural MRI, functional MRI, PET, SPECT, CT, ultrasound, EEG, MEG, NIRS etc.; Neuromodulation and brain stimulation techniques such as TMS and tDCS; Software and hardware for imaging, especially related to human and animal health; Image segmentation in normal and clinical populations; Pattern analysis and classification using machine learning techniques; Computational modeling and analysis; Brain connectivity and connectomics; Systems-level characterization of brain function; Neural networks and neurorobotics; Computer vision, based on human/animal physiology; Brain-computer interface (BCI) technology; Big data, databasing and data mining.