DECA-Net：用于手术器械分割的双编码器和交叉注意融合网络

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Letters Pub Date : 2024-07-31 DOI:10.1016/j.patrec.2024.07.019

Sixin Liang , Jianzhou Zhang , Ang Bian , Jiaying You

{"title":"DECA-Net：用于手术器械分割的双编码器和交叉注意融合网络","authors":"Sixin Liang , Jianzhou Zhang , Ang Bian , Jiaying You","doi":"10.1016/j.patrec.2024.07.019","DOIUrl":null,"url":null,"abstract":"<div><p>Minimally invasive surgery is now widely used to reduce surgical risks, and automatic and accurate instrument segmentation from endoscope videos is crucial for computer-assisted surgical guidance. However, given the rapid development of CNN-based surgical instrument segmentation methods, challenges like motion blur and illumination issues can still cause erroneous segmentation. In this work, we propose a novel dual encoder and cross-attention network (DECA-Net) to overcome these limitations with enhanced context representation and irrelevant feature fusion. Our approach introduces a CNN and Transformer based dual encoder unit for local features and global context information extraction and hence strength the model’s robustness against various illumination conditions. Then an attention fusion module is utilized to combine local feature and global context information and to select instrument-related boundary features. To bridge the semantic gap between encoder and decoder, we propose a parallel dual cross-attention (DCA) block to capture the channel and spatial dependencies across multi-scale encoder to build the enhanced skip connection. Experimental results show that the proposed method achieves state-of-the-art performance on Endovis2017 and Kvasir-instrument datasets.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"185 ","pages":"Pages 130-136"},"PeriodicalIF":3.9000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DECA-Net: Dual encoder and cross-attention fusion network for surgical instrument segmentation\",\"authors\":\"Sixin Liang , Jianzhou Zhang , Ang Bian , Jiaying You\",\"doi\":\"10.1016/j.patrec.2024.07.019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Minimally invasive surgery is now widely used to reduce surgical risks, and automatic and accurate instrument segmentation from endoscope videos is crucial for computer-assisted surgical guidance. However, given the rapid development of CNN-based surgical instrument segmentation methods, challenges like motion blur and illumination issues can still cause erroneous segmentation. In this work, we propose a novel dual encoder and cross-attention network (DECA-Net) to overcome these limitations with enhanced context representation and irrelevant feature fusion. Our approach introduces a CNN and Transformer based dual encoder unit for local features and global context information extraction and hence strength the model’s robustness against various illumination conditions. Then an attention fusion module is utilized to combine local feature and global context information and to select instrument-related boundary features. To bridge the semantic gap between encoder and decoder, we propose a parallel dual cross-attention (DCA) block to capture the channel and spatial dependencies across multi-scale encoder to build the enhanced skip connection. Experimental results show that the proposed method achieves state-of-the-art performance on Endovis2017 and Kvasir-instrument datasets.</p></div>\",\"PeriodicalId\":54638,\"journal\":{\"name\":\"Pattern Recognition Letters\",\"volume\":\"185 \",\"pages\":\"Pages 130-136\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167865524002228\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865524002228","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

目前，微创手术已被广泛应用于降低手术风险，而从内窥镜视频中自动、准确地分割器械对于计算机辅助手术引导至关重要。然而，随着基于 CNN 的手术器械分割方法的快速发展，运动模糊和光照问题等挑战仍可能导致错误分割。在这项工作中，我们提出了一种新颖的双编码器和交叉注意网络（DECA-Net），通过增强上下文表示和不相关特征融合来克服这些限制。我们的方法引入了基于 CNN 和变换器的双编码器单元，用于局部特征和全局上下文信息提取，从而增强了模型在各种光照条件下的鲁棒性。然后，利用注意力融合模块将局部特征和全局上下文信息结合起来，并选择与仪器相关的边界特征。为了弥合编码器和解码器之间的语义鸿沟，我们提出了并行双交叉注意（DCA）模块，以捕捉多尺度编码器之间的通道和空间依赖性，从而建立增强的跳转连接。实验结果表明，所提出的方法在 Endovis2017 和 Kvasir-instrument 数据集上达到了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DECA-Net: Dual encoder and cross-attention fusion network for surgical instrument segmentation

Minimally invasive surgery is now widely used to reduce surgical risks, and automatic and accurate instrument segmentation from endoscope videos is crucial for computer-assisted surgical guidance. However, given the rapid development of CNN-based surgical instrument segmentation methods, challenges like motion blur and illumination issues can still cause erroneous segmentation. In this work, we propose a novel dual encoder and cross-attention network (DECA-Net) to overcome these limitations with enhanced context representation and irrelevant feature fusion. Our approach introduces a CNN and Transformer based dual encoder unit for local features and global context information extraction and hence strength the model’s robustness against various illumination conditions. Then an attention fusion module is utilized to combine local feature and global context information and to select instrument-related boundary features. To bridge the semantic gap between encoder and decoder, we propose a parallel dual cross-attention (DCA) block to capture the channel and spatial dependencies across multi-scale encoder to build the enhanced skip connection. Experimental results show that the proposed method achieves state-of-the-art performance on Endovis2017 and Kvasir-instrument datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pattern Recognition Letters 工程技术-计算机：人工智能

CiteScore

12.40

自引率

5.90%

发文量

287

审稿时长

9.1 months

期刊介绍： Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.