基于双分支感知和多尺度语义聚合的胃镜图像胃解剖部位识别

IF 3.4 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays Pub Date : 2025-09-30 DOI:10.1016/j.displa.2025.103234

Shujun Gao , Xiaomei Yu , Xiao Liang , Xuanchi Chen , Xiangwei Zheng

{"title":"基于双分支感知和多尺度语义聚合的胃镜图像胃解剖部位识别","authors":"Shujun Gao , Xiaomei Yu , Xiao Liang , Xuanchi Chen , Xiangwei Zheng","doi":"10.1016/j.displa.2025.103234","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate recognition of key anatomical sites in gastroscopic images is crucial for systematic screening and region-specific diagnosis of early gastric cancer. However, subtle inter-regional differences and indistinct structural boundaries present in gastroscopic images significantly decrease the clinical performance of existing recognition approaches. To address above challenges, we propose a Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation (GASR) for the identification of five representative gastric regions (the greater and lesser curvatures of the antrum, the incisura angularis, and the greater and lesser curvatures of the corpus). Specifically, we propose a Dual-branch Structural Perception (DBSP) module for leveraging the effective complementarity between the local feature extraction of convolutional neural networks (CNNs) and the global semantic modeling of the Swin Transformer. To further improve contextual feature modeling, we develop a Multi-scale Contextual Sampling Aggregator (MCSA) inspired by the Atrous Spatial Pyramid Pooling (ASPP) to extract features across multiple receptive fields. Additionally, we design a Multi-granular Pooling Aggregator (MGPA) based on the Pyramid Scene Parsing (PSP) mechanism to capture hierarchical spatial semantics and global structural layouts through multi-scale pooling operations. Experimental results on a private, expert-annotated endoscopic image dataset, using five-fold cross-validation demonstrate that GASR achieves a recognition accuracy of 97.15%, with robust boundary discrimination and strong generalization performance and can be accepted in clinical practice, showing potential for clinical deployment in gastroscopy-assisted diagnosis and automated screening of early gastric cancer.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103234"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation\",\"authors\":\"Shujun Gao , Xiaomei Yu , Xiao Liang , Xuanchi Chen , Xiangwei Zheng\",\"doi\":\"10.1016/j.displa.2025.103234\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate recognition of key anatomical sites in gastroscopic images is crucial for systematic screening and region-specific diagnosis of early gastric cancer. However, subtle inter-regional differences and indistinct structural boundaries present in gastroscopic images significantly decrease the clinical performance of existing recognition approaches. To address above challenges, we propose a Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation (GASR) for the identification of five representative gastric regions (the greater and lesser curvatures of the antrum, the incisura angularis, and the greater and lesser curvatures of the corpus). Specifically, we propose a Dual-branch Structural Perception (DBSP) module for leveraging the effective complementarity between the local feature extraction of convolutional neural networks (CNNs) and the global semantic modeling of the Swin Transformer. To further improve contextual feature modeling, we develop a Multi-scale Contextual Sampling Aggregator (MCSA) inspired by the Atrous Spatial Pyramid Pooling (ASPP) to extract features across multiple receptive fields. Additionally, we design a Multi-granular Pooling Aggregator (MGPA) based on the Pyramid Scene Parsing (PSP) mechanism to capture hierarchical spatial semantics and global structural layouts through multi-scale pooling operations. Experimental results on a private, expert-annotated endoscopic image dataset, using five-fold cross-validation demonstrate that GASR achieves a recognition accuracy of 97.15%, with robust boundary discrimination and strong generalization performance and can be accepted in clinical practice, showing potential for clinical deployment in gastroscopy-assisted diagnosis and automated screening of early gastric cancer.</div></div>\",\"PeriodicalId\":50570,\"journal\":{\"name\":\"Displays\",\"volume\":\"91 \",\"pages\":\"Article 103234\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Displays\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141938225002719\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225002719","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

准确识别胃镜图像中的关键解剖部位对早期胃癌的系统筛查和区域特异性诊断至关重要。然而，胃镜图像中存在的微妙的区域间差异和模糊的结构边界显著降低了现有识别方法的临床性能。为了解决上述挑战，我们提出了一种基于双分支感知和多尺度语义聚合（GASR）的胃镜图像中胃解剖部位识别方法，用于识别五个具有代表性的胃区域（胃窦的大曲率和小曲率、角切肌和胃的大曲率和小曲率）。具体来说，我们提出了一个双分支结构感知（DBSP）模块，用于利用卷积神经网络（cnn）的局部特征提取和Swin变压器的全局语义建模之间的有效互补性。为了进一步改进上下文特征建模，我们开发了一种受空间金字塔池（ASPP）启发的多尺度上下文采样聚合器（MCSA）来提取跨多个接受域的特征。此外，我们设计了一个基于金字塔场景解析（PSP）机制的多粒度池化聚合器（MGPA），通过多尺度池化操作捕获分层空间语义和全局结构布局。在专家注释的私人内镜图像数据集上进行五倍交叉验证的实验结果表明，GASR的识别准确率为97.15%，具有鲁棒的边界区分和较强的泛化性能，可用于临床实践，在胃镜辅助诊断和早期胃癌自动筛查中具有临床应用潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation

Accurate recognition of key anatomical sites in gastroscopic images is crucial for systematic screening and region-specific diagnosis of early gastric cancer. However, subtle inter-regional differences and indistinct structural boundaries present in gastroscopic images significantly decrease the clinical performance of existing recognition approaches. To address above challenges, we propose a Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation (GASR) for the identification of five representative gastric regions (the greater and lesser curvatures of the antrum, the incisura angularis, and the greater and lesser curvatures of the corpus). Specifically, we propose a Dual-branch Structural Perception (DBSP) module for leveraging the effective complementarity between the local feature extraction of convolutional neural networks (CNNs) and the global semantic modeling of the Swin Transformer. To further improve contextual feature modeling, we develop a Multi-scale Contextual Sampling Aggregator (MCSA) inspired by the Atrous Spatial Pyramid Pooling (ASPP) to extract features across multiple receptive fields. Additionally, we design a Multi-granular Pooling Aggregator (MGPA) based on the Pyramid Scene Parsing (PSP) mechanism to capture hierarchical spatial semantics and global structural layouts through multi-scale pooling operations. Experimental results on a private, expert-annotated endoscopic image dataset, using five-fold cross-validation demonstrate that GASR achieves a recognition accuracy of 97.15%, with robust boundary discrimination and strong generalization performance and can be accepted in clinical practice, showing potential for clinical deployment in gastroscopy-assisted diagnosis and automated screening of early gastric cancer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.