Sinaro Ly, Adrien Badré, Parker Brandt, Chen Wang, Paul Calle, Justin Reynolds, Qinghao Zhang, Kar-Ming Fung, Haoyang Cui, Zhongxin Yu, Sanjay G Patel, Yunlong Liu, Nathan A Bradley, Qinggong Tang, Chongle Pan
{"title":"Deep Learning for Autonomous Surgical Guidance Using 3-Dimensional Images From Forward-Viewing Endoscopic Optical Coherence Tomography.","authors":"Sinaro Ly, Adrien Badré, Parker Brandt, Chen Wang, Paul Calle, Justin Reynolds, Qinghao Zhang, Kar-Ming Fung, Haoyang Cui, Zhongxin Yu, Sanjay G Patel, Yunlong Liu, Nathan A Bradley, Qinggong Tang, Chongle Pan","doi":"10.1002/jbio.202500181","DOIUrl":null,"url":null,"abstract":"<p><p>A three-dimensional convolutional neural network (3D-CNN) was developed for the analysis of volumetric optical coherence tomography (OCT) images to enhance endoscopic guidance during percutaneous nephrostomy. The model was performance-benchmarked using a 10-fold nested cross-validation procedure and achieved an average test accuracy of 90.57% across a dataset of 10 porcine kidneys. This performance significantly exceeded that of 2D-CNN models that attained average test accuracies ranging from 85.63% to 88.22% using 1, 10, or 100 radial sections extracted from the 3D OCT volumes. The 3D-CNN (~12 million parameters) was benchmarked against three state-of-the-art volumetric architectures: the 3D Vision Transformer (3D-ViT, ~45 million parameters), 3D-DenseNet121 (~12 million parameters), and the Multi-plane and Multi-slice Transformer (M3T, ~29 million parameters). While these models achieved comparable inferencing accuracy, the 3D-CNN exhibited lower inference latency (33 ms) than 3D-ViT (86 ms), 3D-DenseNet121 (58 ms), and M3T (93 ms), representing a critical advantage for real-time surgical guidance applications. These results demonstrate the 3D-CNN's capability as a powerful and practical tool for computer-aided diagnosis in OCT-guided surgical interventions.</p>","PeriodicalId":94068,"journal":{"name":"Journal of biophotonics","volume":" ","pages":"e202500181"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of biophotonics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/jbio.202500181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A three-dimensional convolutional neural network (3D-CNN) was developed for the analysis of volumetric optical coherence tomography (OCT) images to enhance endoscopic guidance during percutaneous nephrostomy. The model was performance-benchmarked using a 10-fold nested cross-validation procedure and achieved an average test accuracy of 90.57% across a dataset of 10 porcine kidneys. This performance significantly exceeded that of 2D-CNN models that attained average test accuracies ranging from 85.63% to 88.22% using 1, 10, or 100 radial sections extracted from the 3D OCT volumes. The 3D-CNN (~12 million parameters) was benchmarked against three state-of-the-art volumetric architectures: the 3D Vision Transformer (3D-ViT, ~45 million parameters), 3D-DenseNet121 (~12 million parameters), and the Multi-plane and Multi-slice Transformer (M3T, ~29 million parameters). While these models achieved comparable inferencing accuracy, the 3D-CNN exhibited lower inference latency (33 ms) than 3D-ViT (86 ms), 3D-DenseNet121 (58 ms), and M3T (93 ms), representing a critical advantage for real-time surgical guidance applications. These results demonstrate the 3D-CNN's capability as a powerful and practical tool for computer-aided diagnosis in OCT-guided surgical interventions.