Wanqing Wang;Fucheng Liu;Jianxiong Hao;Xiangyang Yu;Bo Zhang;Chaoyang Shi
{"title":"Desmoking of the Endoscopic Surgery Images Based on a Local-Global U-Shaped Transformer Model","authors":"Wanqing Wang;Fucheng Liu;Jianxiong Hao;Xiangyang Yu;Bo Zhang;Chaoyang Shi","doi":"10.1109/TMRB.2024.3517139","DOIUrl":null,"url":null,"abstract":"In robot-assisted minimally invasive surgery (RMIS), the smoke generated by energy-based surgical instruments blurs and obstructs the endoscopic surgical field, which increases the difficulty and risk of robotic surgery. However, current desmoking research primarily focuses on natural weather conditions, with limited studies addressing desmoking techniques for endoscopic images. Furthermore, surgical smoke presents a notably intricate morphology, and research efforts aimed at uniform, non-uniform, thin, and dense smoke remain relatively limited. This work proposes a Local-Global U-Shaped Transformer Model (LGUformer) based on the U-Net and Transformer architectures to remove complex smoke from endoscopic images. By introducing a local-global multi-head self-attention mechanism and multi-scale depthwise convolution, the proposed model enhances the inference capability. An enhanced feature map fusion method improves the quality of reconstructed images. The improved modules enable efficient handling of variable smoke while generating superior-quality images. Through desmoking experiments on synthetic and real smoke images, the LGUformer model demonstrated superior performance compared with seven other desmoking models in terms of accuracy, clarity, absence of distortion, and robustness. A task-based surgical instrument segmentation experiment indicated the potential of this model as a pre-processing step in visual tasks. Finally, an ablation study was conducted to verify the advantages of the proposed modules.","PeriodicalId":73318,"journal":{"name":"IEEE transactions on medical robotics and bionics","volume":"7 1","pages":"254-265"},"PeriodicalIF":3.4000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical robotics and bionics","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10798614/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
摘要
在机器人辅助微创手术(RMIS)中,基于能量的手术器械产生的烟雾会模糊和阻碍内窥镜手术视野,从而增加机器人手术的难度和风险。然而,目前的除烟研究主要集中在自然天气条件下,针对内窥镜图像除烟技术的研究非常有限。此外,手术烟雾呈现出明显的复杂形态,针对均匀、非均匀、稀薄和浓密烟雾的研究仍然相对有限。本研究提出了一种基于 U-Net 和 Transformer 架构的局部-全局 U 形变换器模型(LGUformer),用于去除内窥镜图像中的复杂烟雾。通过引入局部-全局多头自关注机制和多尺度深度卷积,该模型增强了推理能力。增强型特征图融合方法提高了重建图像的质量。改进后的模块能够有效处理可变烟雾,同时生成高质量的图像。通过对合成和真实烟雾图像进行除烟实验,LGUformer 模型与其他七个除烟模型相比,在准确性、清晰度、无失真和鲁棒性方面都表现出了卓越的性能。基于任务的手术器械分割实验表明,该模型具有在视觉任务中作为预处理步骤的潜力。最后,还进行了一项消融研究,以验证拟议模块的优势。
Desmoking of the Endoscopic Surgery Images Based on a Local-Global U-Shaped Transformer Model
In robot-assisted minimally invasive surgery (RMIS), the smoke generated by energy-based surgical instruments blurs and obstructs the endoscopic surgical field, which increases the difficulty and risk of robotic surgery. However, current desmoking research primarily focuses on natural weather conditions, with limited studies addressing desmoking techniques for endoscopic images. Furthermore, surgical smoke presents a notably intricate morphology, and research efforts aimed at uniform, non-uniform, thin, and dense smoke remain relatively limited. This work proposes a Local-Global U-Shaped Transformer Model (LGUformer) based on the U-Net and Transformer architectures to remove complex smoke from endoscopic images. By introducing a local-global multi-head self-attention mechanism and multi-scale depthwise convolution, the proposed model enhances the inference capability. An enhanced feature map fusion method improves the quality of reconstructed images. The improved modules enable efficient handling of variable smoke while generating superior-quality images. Through desmoking experiments on synthetic and real smoke images, the LGUformer model demonstrated superior performance compared with seven other desmoking models in terms of accuracy, clarity, absence of distortion, and robustness. A task-based surgical instrument segmentation experiment indicated the potential of this model as a pre-processing step in visual tasks. Finally, an ablation study was conducted to verify the advantages of the proposed modules.