Video Capsule Endoscopy Classification using Focal Modulation Guided Convolutional Neural Network.

Abhishek Srivastava, Nikhil Kumar Tomar, Ulas Bagci, Debesh Jha
{"title":"Video Capsule Endoscopy Classification using Focal Modulation Guided Convolutional Neural Network.","authors":"Abhishek Srivastava,&nbsp;Nikhil Kumar Tomar,&nbsp;Ulas Bagci,&nbsp;Debesh Jha","doi":"10.1109/CBMS55023.2022.00064","DOIUrl":null,"url":null,"abstract":"<p><p>Video capsule endoscopy is a hot topic in computer vision and medicine. Deep learning can have a positive impact on the future of video capsule endoscopy technology. It can improve the anomaly detection rate, reduce physicians' time for screening, and aid in real-world clinical analysis. Computer-Aided diagnosis (CADx) classification system for video capsule endoscopy has shown a great promise for further improvement. For example, detection of cancerous polyp and bleeding can lead to swift medical response and improve the survival rate of the patients. To this end, an automated CADx system must have high throughput and decent accuracy. In this study, we propose <i>FocalConvNet</i>, a focal modulation network integrated with lightweight convolutional layers for the classification of small bowel anatomical landmarks and luminal findings. FocalConvNet leverages focal modulation to attain global context and allows global-local spatial interactions throughout the forward pass. Moreover, the convolutional block with its intrinsic inductive/learning bias and capacity to extract hierarchical features allows our FocalConvNet to achieve favourable results with high throughput. We compare our FocalConvNet with other state-of-the-art (SOTA) on Kvasir-Capsule, a large-scale VCE dataset with 44,228 frames with 13 classes of different anomalies. We achieved the weighted F1-score, recall and Matthews correlation coefficient (MCC) of 0.6734, 0.6373 and 0.2974, respectively, outperforming SOTA methodologies. Further, we obtained the highest throughput of 148.02 images/second rate to establish the potential of FocalConvNet in a real-time clinical environment. The code of the proposed FocalConvNet is available at https://github.com/NoviceMAn-prog/FocalConvNet.</p>","PeriodicalId":74567,"journal":{"name":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","volume":"2022 ","pages":"323-328"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9914988/pdf/nihms-1871537.pdf","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Symposium on Computer-Based Medical Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBMS55023.2022.00064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Video capsule endoscopy is a hot topic in computer vision and medicine. Deep learning can have a positive impact on the future of video capsule endoscopy technology. It can improve the anomaly detection rate, reduce physicians' time for screening, and aid in real-world clinical analysis. Computer-Aided diagnosis (CADx) classification system for video capsule endoscopy has shown a great promise for further improvement. For example, detection of cancerous polyp and bleeding can lead to swift medical response and improve the survival rate of the patients. To this end, an automated CADx system must have high throughput and decent accuracy. In this study, we propose FocalConvNet, a focal modulation network integrated with lightweight convolutional layers for the classification of small bowel anatomical landmarks and luminal findings. FocalConvNet leverages focal modulation to attain global context and allows global-local spatial interactions throughout the forward pass. Moreover, the convolutional block with its intrinsic inductive/learning bias and capacity to extract hierarchical features allows our FocalConvNet to achieve favourable results with high throughput. We compare our FocalConvNet with other state-of-the-art (SOTA) on Kvasir-Capsule, a large-scale VCE dataset with 44,228 frames with 13 classes of different anomalies. We achieved the weighted F1-score, recall and Matthews correlation coefficient (MCC) of 0.6734, 0.6373 and 0.2974, respectively, outperforming SOTA methodologies. Further, we obtained the highest throughput of 148.02 images/second rate to establish the potential of FocalConvNet in a real-time clinical environment. The code of the proposed FocalConvNet is available at https://github.com/NoviceMAn-prog/FocalConvNet.

Abstract Image

基于焦调制引导卷积神经网络的视频胶囊内窥镜分类。
视频胶囊内窥镜是计算机视觉和医学领域的研究热点。深度学习可以对视频胶囊内窥镜技术的未来产生积极的影响。它可以提高异常检出率,减少医生的筛查时间,并有助于现实世界的临床分析。视频胶囊内窥镜的计算机辅助诊断(CADx)分类系统有很大的发展前景。例如,发现癌性息肉和出血可以迅速做出医疗反应,提高患者的存活率。为此,自动化CADx系统必须具有高吞吐量和良好的精度。在这项研究中,我们提出了FocalConvNet,这是一个集成了轻量级卷积层的焦点调制网络,用于小肠解剖标志和腔内发现的分类。FocalConvNet利用焦点调制来获得全局上下文,并允许在整个前传过程中进行全局-局部空间交互。此外,卷积块具有其固有的归纳/学习偏差和提取分层特征的能力,使我们的FocalConvNet能够以高吞吐量获得良好的结果。我们将我们的FocalConvNet与Kvasir-Capsule上的其他先进技术(SOTA)进行了比较,Kvasir-Capsule是一个大型VCE数据集,具有44,228帧和13类不同的异常。我们获得了加权f1得分,召回率和马修斯相关系数(MCC)分别为0.6734,0.6373和0.2974,优于SOTA方法。此外,我们获得了148.02张图像/秒的最高吞吐量,以确定FocalConvNet在实时临床环境中的潜力。所提出的FocalConvNet的代码可在https://github.com/NoviceMAn-prog/FocalConvNet上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信