Laparoscopic stereo matching using 3-Dimensional Fourier transform with full multi-scale features

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence Pub Date : 2024-11-15 DOI:10.1016/j.engappai.2024.109654

Renkai Wu , Pengchen Liang , Yinghao Liu , Yiqi Huang , Wangyan Li , Qing Chang

{"title":"Laparoscopic stereo matching using 3-Dimensional Fourier transform with full multi-scale features","authors":"Renkai Wu , Pengchen Liang , Yinghao Liu , Yiqi Huang , Wangyan Li , Qing Chang","doi":"10.1016/j.engappai.2024.109654","DOIUrl":null,"url":null,"abstract":"<div><div>3-Dimensional (3D) reconstruction of laparoscopic surgical scenes is a key task for future surgical navigation and automated robotic minimally invasive surgery. Binocular laparoscopy with stereo matching enables 3D reconstruction. Stereo matching models used for natural images such as autopilot tend to be less suitable for laparoscopic environments due to the constraints of small samples of laparoscopic images, complex textures, and uneven illumination. In addition, current stereo matching modules use 3D convolutions and transformers in the spatial domain as the base module, which is limited by the ability to learn in the spatial domain. In this paper, we propose a model for laparoscopic stereo matching using 3D Fourier Transform combined with Full Multi-scale Features (FT-FMF Net). Specifically, the proposed Full Multi-scale Fusion Module (FMFM) is able to fuse the full multi-scale feature information from the feature extractor into the stereo matching block, which densely learns the feature information with parallax and FMFM fusion information in the frequency domain using the proposed Dense Fourier Transform Module (DFTM). We validated the proposed method in both the laparoscopic dataset (SCARED) and the endoscopic dataset (SERV-CT). In comparison with other popular and advanced deep learning models available at present, FT-FMF Net achieves the most advanced stereo matching performance available. In the SCARED and SERV-CT public datasets, the End-Point-Error (EPE) was 0.7265 and 2.3119, and the Root Mean Square Error Depth (RMSE Depth) was 4.00 mm and 3.69 mm, respectively. In addition, the inference time is only 0.17s. Our project code is available on https://github.com/wurenkai/FT-FMF<svg><path></path></svg>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109654"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197624018128","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

3-Dimensional (3D) reconstruction of laparoscopic surgical scenes is a key task for future surgical navigation and automated robotic minimally invasive surgery. Binocular laparoscopy with stereo matching enables 3D reconstruction. Stereo matching models used for natural images such as autopilot tend to be less suitable for laparoscopic environments due to the constraints of small samples of laparoscopic images, complex textures, and uneven illumination. In addition, current stereo matching modules use 3D convolutions and transformers in the spatial domain as the base module, which is limited by the ability to learn in the spatial domain. In this paper, we propose a model for laparoscopic stereo matching using 3D Fourier Transform combined with Full Multi-scale Features (FT-FMF Net). Specifically, the proposed Full Multi-scale Fusion Module (FMFM) is able to fuse the full multi-scale feature information from the feature extractor into the stereo matching block, which densely learns the feature information with parallax and FMFM fusion information in the frequency domain using the proposed Dense Fourier Transform Module (DFTM). We validated the proposed method in both the laparoscopic dataset (SCARED) and the endoscopic dataset (SERV-CT). In comparison with other popular and advanced deep learning models available at present, FT-FMF Net achieves the most advanced stereo matching performance available. In the SCARED and SERV-CT public datasets, the End-Point-Error (EPE) was 0.7265 and 2.3119, and the Root Mean Square Error Depth (RMSE Depth) was 4.00 mm and 3.69 mm, respectively. In addition, the inference time is only 0.17s. Our project code is available on https://github.com/wurenkai/FT-FMF.

查看原文本刊更多论文

利用三维傅立叶变换与全多尺度特征进行腹腔镜立体匹配

腹腔镜手术场景的三维（3D）重建是未来手术导航和自动机器人微创手术的关键任务。带有立体匹配功能的双目腹腔镜可实现三维重建。由于腹腔镜图像样本小、纹理复杂、光照不均等限制，用于自动驾驶等自然图像的立体匹配模型往往不太适合腹腔镜环境。此外，目前的立体匹配模块使用空间域的三维卷积和变换器作为基础模块，这就限制了空间域的学习能力。在本文中，我们提出了一种使用三维傅立叶变换结合全多尺度特征（FT-FMF Net）的腹腔镜立体匹配模型。具体来说，所提出的全多尺度融合模块（FMFM）能够将特征提取器中的全多尺度特征信息融合到立体匹配模块中，该模块利用所提出的密集傅立叶变换模块（DFTM）在频域中密集学习带有视差和 FMFM 融合信息的特征信息。我们在腹腔镜数据集（SCARED）和内窥镜数据集（SERV-CT）中验证了所提出的方法。与目前其他流行的高级深度学习模型相比，FT-FMF Net 实现了目前最先进的立体匹配性能。在 SCARED 和 SERV-CT 公共数据集中，端点误差（EPE）分别为 0.7265 和 2.3119，均方根误差深度（RMSE Depth）分别为 4.00 毫米和 3.69 毫米。此外，推理时间仅为 0.17 秒。我们的项目代码可在 https://github.com/wurenkai/FT-FMF 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.