Deep Learning for Autonomous Surgical Guidance Using 3-Dimensional Images From Forward-Viewing Endoscopic Optical Coherence Tomography.

Journal of biophotonics Pub Date : 2025-07-25 DOI:10.1002/jbio.202500181

Sinaro Ly, Adrien Badré, Parker Brandt, Chen Wang, Paul Calle, Justin Reynolds, Qinghao Zhang, Kar-Ming Fung, Haoyang Cui, Zhongxin Yu, Sanjay G Patel, Yunlong Liu, Nathan A Bradley, Qinggong Tang, Chongle Pan

{"title":"Deep Learning for Autonomous Surgical Guidance Using 3-Dimensional Images From Forward-Viewing Endoscopic Optical Coherence Tomography.","authors":"Sinaro Ly, Adrien Badré, Parker Brandt, Chen Wang, Paul Calle, Justin Reynolds, Qinghao Zhang, Kar-Ming Fung, Haoyang Cui, Zhongxin Yu, Sanjay G Patel, Yunlong Liu, Nathan A Bradley, Qinggong Tang, Chongle Pan","doi":"10.1002/jbio.202500181","DOIUrl":null,"url":null,"abstract":"<p><p>A three-dimensional convolutional neural network (3D-CNN) was developed for the analysis of volumetric optical coherence tomography (OCT) images to enhance endoscopic guidance during percutaneous nephrostomy. The model was performance-benchmarked using a 10-fold nested cross-validation procedure and achieved an average test accuracy of 90.57% across a dataset of 10 porcine kidneys. This performance significantly exceeded that of 2D-CNN models that attained average test accuracies ranging from 85.63% to 88.22% using 1, 10, or 100 radial sections extracted from the 3D OCT volumes. The 3D-CNN (~12 million parameters) was benchmarked against three state-of-the-art volumetric architectures: the 3D Vision Transformer (3D-ViT, ~45 million parameters), 3D-DenseNet121 (~12 million parameters), and the Multi-plane and Multi-slice Transformer (M3T, ~29 million parameters). While these models achieved comparable inferencing accuracy, the 3D-CNN exhibited lower inference latency (33 ms) than 3D-ViT (86 ms), 3D-DenseNet121 (58 ms), and M3T (93 ms), representing a critical advantage for real-time surgical guidance applications. These results demonstrate the 3D-CNN's capability as a powerful and practical tool for computer-aided diagnosis in OCT-guided surgical interventions.</p>","PeriodicalId":94068,"journal":{"name":"Journal of biophotonics","volume":" ","pages":"e202500181"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of biophotonics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/jbio.202500181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

A three-dimensional convolutional neural network (3D-CNN) was developed for the analysis of volumetric optical coherence tomography (OCT) images to enhance endoscopic guidance during percutaneous nephrostomy. The model was performance-benchmarked using a 10-fold nested cross-validation procedure and achieved an average test accuracy of 90.57% across a dataset of 10 porcine kidneys. This performance significantly exceeded that of 2D-CNN models that attained average test accuracies ranging from 85.63% to 88.22% using 1, 10, or 100 radial sections extracted from the 3D OCT volumes. The 3D-CNN (~12 million parameters) was benchmarked against three state-of-the-art volumetric architectures: the 3D Vision Transformer (3D-ViT, ~45 million parameters), 3D-DenseNet121 (~12 million parameters), and the Multi-plane and Multi-slice Transformer (M3T, ~29 million parameters). While these models achieved comparable inferencing accuracy, the 3D-CNN exhibited lower inference latency (33 ms) than 3D-ViT (86 ms), 3D-DenseNet121 (58 ms), and M3T (93 ms), representing a critical advantage for real-time surgical guidance applications. These results demonstrate the 3D-CNN's capability as a powerful and practical tool for computer-aided diagnosis in OCT-guided surgical interventions.

查看原文本刊更多论文

利用前视内窥镜光学相干断层扫描的三维图像进行自主手术指导的深度学习。

三维卷积神经网络（3D-CNN）用于分析容积光学相干断层扫描（OCT）图像，以增强经皮肾造口术的内镜指导。该模型使用10倍嵌套交叉验证程序进行性能基准测试，在10个猪肾脏数据集上实现了90.57%的平均测试精度。这一性能显著超过了2D-CNN模型，后者使用从3D OCT体积中提取的1、10或100个径向切片获得了85.63%至88.22%的平均测试精度。3D- cnn（约1200万参数）与三种最先进的体积架构进行了基准测试：3D视觉变压器（3D- vit，约4500万参数），3D- densenet121（约1200万参数），以及多平面和多片变压器（M3T，约2900万参数）。虽然这些模型达到了相当的推理精度，但3D-CNN的推理延迟（33 ms）比3D-ViT （86 ms）、3D-DenseNet121 （58 ms）和M3T （93 ms）更低，这在实时手术指导应用中具有关键优势。这些结果证明了3D-CNN在oct引导的手术干预中作为计算机辅助诊断的强大实用工具的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of biophotonics

自引率

0.00%

发文量