{"title":"基于双分支感知和多尺度语义聚合的胃镜图像胃解剖部位识别","authors":"Shujun Gao , Xiaomei Yu , Xiao Liang , Xuanchi Chen , Xiangwei Zheng","doi":"10.1016/j.displa.2025.103234","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate recognition of key anatomical sites in gastroscopic images is crucial for systematic screening and region-specific diagnosis of early gastric cancer. However, subtle inter-regional differences and indistinct structural boundaries present in gastroscopic images significantly decrease the clinical performance of existing recognition approaches. To address above challenges, we propose a Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation (GASR) for the identification of five representative gastric regions (the greater and lesser curvatures of the antrum, the incisura angularis, and the greater and lesser curvatures of the corpus). Specifically, we propose a Dual-branch Structural Perception (DBSP) module for leveraging the effective complementarity between the local feature extraction of convolutional neural networks (CNNs) and the global semantic modeling of the Swin Transformer. To further improve contextual feature modeling, we develop a Multi-scale Contextual Sampling Aggregator (MCSA) inspired by the Atrous Spatial Pyramid Pooling (ASPP) to extract features across multiple receptive fields. Additionally, we design a Multi-granular Pooling Aggregator (MGPA) based on the Pyramid Scene Parsing (PSP) mechanism to capture hierarchical spatial semantics and global structural layouts through multi-scale pooling operations. Experimental results on a private, expert-annotated endoscopic image dataset, using five-fold cross-validation demonstrate that GASR achieves a recognition accuracy of 97.15%, with robust boundary discrimination and strong generalization performance and can be accepted in clinical practice, showing potential for clinical deployment in gastroscopy-assisted diagnosis and automated screening of early gastric cancer.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103234"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation\",\"authors\":\"Shujun Gao , Xiaomei Yu , Xiao Liang , Xuanchi Chen , Xiangwei Zheng\",\"doi\":\"10.1016/j.displa.2025.103234\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Accurate recognition of key anatomical sites in gastroscopic images is crucial for systematic screening and region-specific diagnosis of early gastric cancer. However, subtle inter-regional differences and indistinct structural boundaries present in gastroscopic images significantly decrease the clinical performance of existing recognition approaches. To address above challenges, we propose a Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation (GASR) for the identification of five representative gastric regions (the greater and lesser curvatures of the antrum, the incisura angularis, and the greater and lesser curvatures of the corpus). Specifically, we propose a Dual-branch Structural Perception (DBSP) module for leveraging the effective complementarity between the local feature extraction of convolutional neural networks (CNNs) and the global semantic modeling of the Swin Transformer. To further improve contextual feature modeling, we develop a Multi-scale Contextual Sampling Aggregator (MCSA) inspired by the Atrous Spatial Pyramid Pooling (ASPP) to extract features across multiple receptive fields. Additionally, we design a Multi-granular Pooling Aggregator (MGPA) based on the Pyramid Scene Parsing (PSP) mechanism to capture hierarchical spatial semantics and global structural layouts through multi-scale pooling operations. Experimental results on a private, expert-annotated endoscopic image dataset, using five-fold cross-validation demonstrate that GASR achieves a recognition accuracy of 97.15%, with robust boundary discrimination and strong generalization performance and can be accepted in clinical practice, showing potential for clinical deployment in gastroscopy-assisted diagnosis and automated screening of early gastric cancer.</div></div>\",\"PeriodicalId\":50570,\"journal\":{\"name\":\"Displays\",\"volume\":\"91 \",\"pages\":\"Article 103234\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Displays\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141938225002719\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225002719","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation
Accurate recognition of key anatomical sites in gastroscopic images is crucial for systematic screening and region-specific diagnosis of early gastric cancer. However, subtle inter-regional differences and indistinct structural boundaries present in gastroscopic images significantly decrease the clinical performance of existing recognition approaches. To address above challenges, we propose a Gastric Anatomical Sites Recognition in Gastroscopic Images Based on Dual-branch Perception and Multi-scale Semantic Aggregation (GASR) for the identification of five representative gastric regions (the greater and lesser curvatures of the antrum, the incisura angularis, and the greater and lesser curvatures of the corpus). Specifically, we propose a Dual-branch Structural Perception (DBSP) module for leveraging the effective complementarity between the local feature extraction of convolutional neural networks (CNNs) and the global semantic modeling of the Swin Transformer. To further improve contextual feature modeling, we develop a Multi-scale Contextual Sampling Aggregator (MCSA) inspired by the Atrous Spatial Pyramid Pooling (ASPP) to extract features across multiple receptive fields. Additionally, we design a Multi-granular Pooling Aggregator (MGPA) based on the Pyramid Scene Parsing (PSP) mechanism to capture hierarchical spatial semantics and global structural layouts through multi-scale pooling operations. Experimental results on a private, expert-annotated endoscopic image dataset, using five-fold cross-validation demonstrate that GASR achieves a recognition accuracy of 97.15%, with robust boundary discrimination and strong generalization performance and can be accepted in clinical practice, showing potential for clinical deployment in gastroscopy-assisted diagnosis and automated screening of early gastric cancer.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.