基于双分支感知和多尺度语义聚合的胃镜图像胃解剖部位识别

IF 3.4 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Shujun Gao , Xiaomei Yu , Xiao Liang , Xuanchi Chen , Xiangwei Zheng
{"title":"基于双分支感知和多尺度语义聚合的胃镜图像胃解剖部位识别","authors":"Shujun Gao ,&nbsp;Xiaomei Yu ,&nbsp;Xiao Liang ,&nbsp;Xuanchi Chen ,&nbsp;Xiangwei Zheng","doi":"10.1016/j.displa.2025.103234","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate recognition of key anatomical sites in gastroscopic images is crucial for systematic screening and region-specific diagnosis of early gastric cancer. However, subtle inter-regional differences and indistinct structural boundaries present in gastroscopic images significantly decrease the clinical performance of existing recognition approaches. To address above challenges, we propose a Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation (GASR) for the identification of five representative gastric regions (the greater and lesser curvatures of the antrum, the incisura angularis, and the greater and lesser curvatures of the corpus). Specifically, we propose a Dual-branch Structural Perception (DBSP) module for leveraging the effective complementarity between the local feature extraction of convolutional neural networks (CNNs) and the global semantic modeling of the Swin Transformer. To further improve contextual feature modeling, we develop a Multi-scale Contextual Sampling Aggregator (MCSA) inspired by the Atrous Spatial Pyramid Pooling (ASPP) to extract features across multiple receptive fields. Additionally, we design a Multi-granular Pooling Aggregator (MGPA) based on the Pyramid Scene Parsing (PSP) mechanism to capture hierarchical spatial semantics and global structural layouts through multi-scale pooling operations. Experimental results on a private, expert-annotated endoscopic image dataset, using five-fold cross-validation demonstrate that GASR achieves a recognition accuracy of 97.15%, with robust boundary discrimination and strong generalization performance and can be accepted in clinical practice, showing potential for clinical deployment in gastroscopy-assisted diagnosis and automated screening of early gastric cancer.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103234"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation\",\"authors\":\"Shujun Gao ,&nbsp;Xiaomei Yu ,&nbsp;Xiao Liang ,&nbsp;Xuanchi Chen ,&nbsp;Xiangwei Zheng\",\"doi\":\"10.1016/j.displa.2025.103234\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate recognition of key anatomical sites in gastroscopic images is crucial for systematic screening and region-specific diagnosis of early gastric cancer. However, subtle inter-regional differences and indistinct structural boundaries present in gastroscopic images significantly decrease the clinical performance of existing recognition approaches. To address above challenges, we propose a Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation (GASR) for the identification of five representative gastric regions (the greater and lesser curvatures of the antrum, the incisura angularis, and the greater and lesser curvatures of the corpus). Specifically, we propose a Dual-branch Structural Perception (DBSP) module for leveraging the effective complementarity between the local feature extraction of convolutional neural networks (CNNs) and the global semantic modeling of the Swin Transformer. To further improve contextual feature modeling, we develop a Multi-scale Contextual Sampling Aggregator (MCSA) inspired by the Atrous Spatial Pyramid Pooling (ASPP) to extract features across multiple receptive fields. Additionally, we design a Multi-granular Pooling Aggregator (MGPA) based on the Pyramid Scene Parsing (PSP) mechanism to capture hierarchical spatial semantics and global structural layouts through multi-scale pooling operations. Experimental results on a private, expert-annotated endoscopic image dataset, using five-fold cross-validation demonstrate that GASR achieves a recognition accuracy of 97.15%, with robust boundary discrimination and strong generalization performance and can be accepted in clinical practice, showing potential for clinical deployment in gastroscopy-assisted diagnosis and automated screening of early gastric cancer.</div></div>\",\"PeriodicalId\":50570,\"journal\":{\"name\":\"Displays\",\"volume\":\"91 \",\"pages\":\"Article 103234\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Displays\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141938225002719\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225002719","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

准确识别胃镜图像中的关键解剖部位对早期胃癌的系统筛查和区域特异性诊断至关重要。然而,胃镜图像中存在的微妙的区域间差异和模糊的结构边界显著降低了现有识别方法的临床性能。为了解决上述挑战,我们提出了一种基于双分支感知和多尺度语义聚合(GASR)的胃镜图像中胃解剖部位识别方法,用于识别五个具有代表性的胃区域(胃窦的大曲率和小曲率、角切肌和胃的大曲率和小曲率)。具体来说,我们提出了一个双分支结构感知(DBSP)模块,用于利用卷积神经网络(cnn)的局部特征提取和Swin变压器的全局语义建模之间的有效互补性。为了进一步改进上下文特征建模,我们开发了一种受空间金字塔池(ASPP)启发的多尺度上下文采样聚合器(MCSA)来提取跨多个接受域的特征。此外,我们设计了一个基于金字塔场景解析(PSP)机制的多粒度池化聚合器(MGPA),通过多尺度池化操作捕获分层空间语义和全局结构布局。在专家注释的私人内镜图像数据集上进行五倍交叉验证的实验结果表明,GASR的识别准确率为97.15%,具有鲁棒的边界区分和较强的泛化性能,可用于临床实践,在胃镜辅助诊断和早期胃癌自动筛查中具有临床应用潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation
Accurate recognition of key anatomical sites in gastroscopic images is crucial for systematic screening and region-specific diagnosis of early gastric cancer. However, subtle inter-regional differences and indistinct structural boundaries present in gastroscopic images significantly decrease the clinical performance of existing recognition approaches. To address above challenges, we propose a Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation (GASR) for the identification of five representative gastric regions (the greater and lesser curvatures of the antrum, the incisura angularis, and the greater and lesser curvatures of the corpus). Specifically, we propose a Dual-branch Structural Perception (DBSP) module for leveraging the effective complementarity between the local feature extraction of convolutional neural networks (CNNs) and the global semantic modeling of the Swin Transformer. To further improve contextual feature modeling, we develop a Multi-scale Contextual Sampling Aggregator (MCSA) inspired by the Atrous Spatial Pyramid Pooling (ASPP) to extract features across multiple receptive fields. Additionally, we design a Multi-granular Pooling Aggregator (MGPA) based on the Pyramid Scene Parsing (PSP) mechanism to capture hierarchical spatial semantics and global structural layouts through multi-scale pooling operations. Experimental results on a private, expert-annotated endoscopic image dataset, using five-fold cross-validation demonstrate that GASR achieves a recognition accuracy of 97.15%, with robust boundary discrimination and strong generalization performance and can be accepted in clinical practice, showing potential for clinical deployment in gastroscopy-assisted diagnosis and automated screening of early gastric cancer.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Displays
Displays 工程技术-工程:电子与电气
CiteScore
4.60
自引率
25.60%
发文量
138
审稿时长
92 days
期刊介绍: Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信