DECA-Net:用于手术器械分割的双编码器和交叉注意融合网络

IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Sixin Liang , Jianzhou Zhang , Ang Bian , Jiaying You
{"title":"DECA-Net:用于手术器械分割的双编码器和交叉注意融合网络","authors":"Sixin Liang ,&nbsp;Jianzhou Zhang ,&nbsp;Ang Bian ,&nbsp;Jiaying You","doi":"10.1016/j.patrec.2024.07.019","DOIUrl":null,"url":null,"abstract":"<div><p>Minimally invasive surgery is now widely used to reduce surgical risks, and automatic and accurate instrument segmentation from endoscope videos is crucial for computer-assisted surgical guidance. However, given the rapid development of CNN-based surgical instrument segmentation methods, challenges like motion blur and illumination issues can still cause erroneous segmentation. In this work, we propose a novel dual encoder and cross-attention network (DECA-Net) to overcome these limitations with enhanced context representation and irrelevant feature fusion. Our approach introduces a CNN and Transformer based dual encoder unit for local features and global context information extraction and hence strength the model’s robustness against various illumination conditions. Then an attention fusion module is utilized to combine local feature and global context information and to select instrument-related boundary features. To bridge the semantic gap between encoder and decoder, we propose a parallel dual cross-attention (DCA) block to capture the channel and spatial dependencies across multi-scale encoder to build the enhanced skip connection. Experimental results show that the proposed method achieves state-of-the-art performance on Endovis2017 and Kvasir-instrument datasets.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 130-136"},"PeriodicalIF":3.9000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DECA-Net: Dual encoder and cross-attention fusion network for surgical instrument segmentation\",\"authors\":\"Sixin Liang ,&nbsp;Jianzhou Zhang ,&nbsp;Ang Bian ,&nbsp;Jiaying You\",\"doi\":\"10.1016/j.patrec.2024.07.019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Minimally invasive surgery is now widely used to reduce surgical risks, and automatic and accurate instrument segmentation from endoscope videos is crucial for computer-assisted surgical guidance. However, given the rapid development of CNN-based surgical instrument segmentation methods, challenges like motion blur and illumination issues can still cause erroneous segmentation. In this work, we propose a novel dual encoder and cross-attention network (DECA-Net) to overcome these limitations with enhanced context representation and irrelevant feature fusion. Our approach introduces a CNN and Transformer based dual encoder unit for local features and global context information extraction and hence strength the model’s robustness against various illumination conditions. Then an attention fusion module is utilized to combine local feature and global context information and to select instrument-related boundary features. To bridge the semantic gap between encoder and decoder, we propose a parallel dual cross-attention (DCA) block to capture the channel and spatial dependencies across multi-scale encoder to build the enhanced skip connection. Experimental results show that the proposed method achieves state-of-the-art performance on Endovis2017 and Kvasir-instrument datasets.</p></div>\",\"PeriodicalId\":54638,\"journal\":{\"name\":\"Pattern Recognition Letters\",\"volume\":\"185 \",\"pages\":\"Pages 130-136\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167865524002228\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524002228","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

目前,微创手术已被广泛应用于降低手术风险,而从内窥镜视频中自动、准确地分割器械对于计算机辅助手术引导至关重要。然而,随着基于 CNN 的手术器械分割方法的快速发展,运动模糊和光照问题等挑战仍可能导致错误分割。在这项工作中,我们提出了一种新颖的双编码器和交叉注意网络(DECA-Net),通过增强上下文表示和不相关特征融合来克服这些限制。我们的方法引入了基于 CNN 和变换器的双编码器单元,用于局部特征和全局上下文信息提取,从而增强了模型在各种光照条件下的鲁棒性。然后,利用注意力融合模块将局部特征和全局上下文信息结合起来,并选择与仪器相关的边界特征。为了弥合编码器和解码器之间的语义鸿沟,我们提出了并行双交叉注意(DCA)模块,以捕捉多尺度编码器之间的通道和空间依赖性,从而建立增强的跳转连接。实验结果表明,所提出的方法在 Endovis2017 和 Kvasir-instrument 数据集上达到了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DECA-Net: Dual encoder and cross-attention fusion network for surgical instrument segmentation

Minimally invasive surgery is now widely used to reduce surgical risks, and automatic and accurate instrument segmentation from endoscope videos is crucial for computer-assisted surgical guidance. However, given the rapid development of CNN-based surgical instrument segmentation methods, challenges like motion blur and illumination issues can still cause erroneous segmentation. In this work, we propose a novel dual encoder and cross-attention network (DECA-Net) to overcome these limitations with enhanced context representation and irrelevant feature fusion. Our approach introduces a CNN and Transformer based dual encoder unit for local features and global context information extraction and hence strength the model’s robustness against various illumination conditions. Then an attention fusion module is utilized to combine local feature and global context information and to select instrument-related boundary features. To bridge the semantic gap between encoder and decoder, we propose a parallel dual cross-attention (DCA) block to capture the channel and spatial dependencies across multi-scale encoder to build the enhanced skip connection. Experimental results show that the proposed method achieves state-of-the-art performance on Endovis2017 and Kvasir-instrument datasets.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Pattern Recognition Letters
Pattern Recognition Letters 工程技术-计算机:人工智能
CiteScore
12.40
自引率
5.90%
发文量
287
审稿时长
9.1 months
期刊介绍: Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信