Combining the Variational and Deep Learning Techniques for Classification of Video Capsule Endoscopic Images.

Journal of imaging informatics in medicine Pub Date : 2025-01-03 DOI:10.1007/s10278-024-01352-y

Bhavana Singh, Pushpendra Kumar, Shailendra Kumar Jain

{"title":"Combining the Variational and Deep Learning Techniques for Classification of Video Capsule Endoscopic Images.","authors":"Bhavana Singh, Pushpendra Kumar, Shailendra Kumar Jain","doi":"10.1007/s10278-024-01352-y","DOIUrl":null,"url":null,"abstract":"<p><p>Gastrointestinal tract-related cancers pose a significant health burden, with high mortality rates. In order to detect the anomalies of the gastrointestinal tract that may progress to cancer, a video capsule endoscopy procedure is employed. The number of video capsule endoscopic ( <math><mi>VCE</mi></math> ) images produced per examination is enormous, which necessitates hours of analysis by clinicians. Therefore, there is a pressing need for automated computer-aided lesion classification techniques. Computer-aided systems utilize deep learning (DL) techniques, as they can potentially enhance anomaly detection rates. However, most of the DL techniques available in the literature utilizes the static frames for the classification purpose, which uses only the spatial information of the image. In addition, they only perform binary classification. Thus, the presented work proposes a framework to perform multi-class classification of <math><mi>VCE</mi></math> images by using the dynamic information of the images. The proposed algorithm is a combination of the fractional order variational model and the DL model. The fractional order variational model captures the dynamic information of <math><mi>VCE</mi></math> images by estimating optical flow color maps. Optical flow color maps are fed to the DL model for training. The DL model performs the multi-class classification task and localizes the region of interest with the maximum class score. DL model is inspired by the Faster RCNN approach, and its backbone architecture is EfficientNet B0. The proposed framework achieves the average AUC value of 0.98, mAP value of 0.93, and 0.878 as balanced accuracy value. Hence, the proposed model is efficient in <math><mi>VCE</mi></math> image classification and detection of region of interest.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-024-01352-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Gastrointestinal tract-related cancers pose a significant health burden, with high mortality rates. In order to detect the anomalies of the gastrointestinal tract that may progress to cancer, a video capsule endoscopy procedure is employed. The number of video capsule endoscopic ( $VCE$ ) images produced per examination is enormous, which necessitates hours of analysis by clinicians. Therefore, there is a pressing need for automated computer-aided lesion classification techniques. Computer-aided systems utilize deep learning (DL) techniques, as they can potentially enhance anomaly detection rates. However, most of the DL techniques available in the literature utilizes the static frames for the classification purpose, which uses only the spatial information of the image. In addition, they only perform binary classification. Thus, the presented work proposes a framework to perform multi-class classification of $VCE$ images by using the dynamic information of the images. The proposed algorithm is a combination of the fractional order variational model and the DL model. The fractional order variational model captures the dynamic information of $VCE$ images by estimating optical flow color maps. Optical flow color maps are fed to the DL model for training. The DL model performs the multi-class classification task and localizes the region of interest with the maximum class score. DL model is inspired by the Faster RCNN approach, and its backbone architecture is EfficientNet B0. The proposed framework achieves the average AUC value of 0.98, mAP value of 0.93, and 0.878 as balanced accuracy value. Hence, the proposed model is efficient in $VCE$ image classification and detection of region of interest.

查看原文本刊更多论文

结合变分和深度学习技术的视频胶囊内窥镜图像分类。

胃肠道相关癌症是一项重大的健康负担，死亡率高。为了检测可能发展为癌症的胃肠道异常，采用视频胶囊内窥镜检查。每次检查产生的视频胶囊内窥镜（VCE）图像数量巨大，需要临床医生进行数小时的分析。因此，迫切需要计算机辅助的自动病变分类技术。计算机辅助系统利用深度学习（DL）技术，因为它们可以潜在地提高异常检测率。然而，文献中可用的大多数深度学习技术使用静态框架进行分类，仅使用图像的空间信息。此外，它们只执行二值分类。因此，本文提出了一种利用图像动态信息对VCE图像进行多类分类的框架。该算法是分数阶变分模型和深度学习模型的结合。分数阶变分模型通过估计光流颜色映射来捕获VCE图像的动态信息。将光流颜色图输入深度学习模型进行训练。DL模型执行多类分类任务，用最大的类分数定位感兴趣的区域。DL模型的灵感来自于Faster RCNN方法，其主干架构是EfficientNet B0。该框架的平均AUC值为0.98，mAP值为0.93，平衡精度值为0.878。因此，该模型在VCE图像分类和感兴趣区域检测方面是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of imaging informatics in medicine

自引率

0.00%

发文量