Deep learning-based multi-element identification system for percutaneous endoscopic spine surgery: development and comparative evaluation of neural network models

IF 3.5 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Intelligence Pub Date : 2025-09-02 DOI:10.1007/s10489-025-06641-9

Jinhui Bu, Yari Wang, Jiaqi Zhao, Sen Huang, Jun Liang, Zhenfei Wang, Long Xu, Yan Lei, Bo He, Minghui Dong, Guangpu Liu, Ru Niu, Chao Ma, Guangwang Liu

{"title":"Deep learning-based multi-element identification system for percutaneous endoscopic spine surgery: development and comparative evaluation of neural network models","authors":"Jinhui Bu, Yari Wang, Jiaqi Zhao, Sen Huang, Jun Liang, Zhenfei Wang, Long Xu, Yan Lei, Bo He, Minghui Dong, Guangpu Liu, Ru Niu, Chao Ma, Guangwang Liu","doi":"10.1007/s10489-025-06641-9","DOIUrl":null,"url":null,"abstract":"<div><p>As the update of medical equipment and technology accelerates, the surgical options of spinal disease have gradually developed from the traditional open surgery to the present various minimally invasive endoscopic surgery. Among these, percutaneous endoscopic discectomy stands out as one of the main procedures for treating lumbar disc herniation and lumbar spinal stenosis. Currently, the application of computer deep learning technology has demonstrated promising results in clinical diagnosis and treatment. The report aims to first describe a deep learning-based multi-element identification system for the visual field in percutaneous endoscopic spine surgery and to evaluate its feasibility. We established an image database by collecting surgical videos from 80 patients diagnosed with lumbar disc herniation and lumbar spinal stenosis, which were labeled by two spinal surgeons. We selected 10000 images of the visual field of percutaneous endoscopic spine surgery (including various tissue structures and surgical instruments), divided into the training data, validation data, and test data according to 3:1:1. We developed neural network models based on instance segmentation - VMamba, Mask RCNN, HIRI-ViT-B. Mean average precision (mAP) and frames per second (FPS) were used to measure the performance of each model for classification ,localization and recognition in real-time, and AP (average) is used to evaluate how easily an element is detected by neural networks based on computer deep learning. Combining the structural characteristics and performance comparison of the various types of models, the results from the test dataset show that VMamba (SSM) performs best in image boundary box detection (mAP = 79.1%) and contour segmentation (mAP = 81.6%), while HIRI-ViT-B is faster in real-time image processing (FPS = 42.7). Combining the average precision of the elements in the bounding box test and segmentation tasks in each network, the AP(average) was highest for tool 3 (bbox-0.91,segm-0.89) and lowest for tool 5 (bbox-0.80,segm-0.74) in the instrumentation. Among the tissue elements, the accuracy of bounding box detection and contour segmentation was highest for the ligamentum flavum (bbox-0.80,segm-0.75), and lowest for extra-dural fat (bbox-0.57,segm-0.54). This study creates the first instance segmentation-based dataset focusing on multiple elements (anatomical tissue, surgical instruments)within the field of view of spinal endoscopic surgery, and integrate computer vision intelligence with spinal endoscopic surgery by developing various neural networks to recognize, classify, and segment the target elements in the dataset, tracking the whole operation. By comparing three models - VMamba, Mask RCNN, HIRI-ViT-B, we recommended VMamba (SSM) model for the intraoperative real-time assistance system for spinal endoscopic operation.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 11","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06641-9","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

As the update of medical equipment and technology accelerates, the surgical options of spinal disease have gradually developed from the traditional open surgery to the present various minimally invasive endoscopic surgery. Among these, percutaneous endoscopic discectomy stands out as one of the main procedures for treating lumbar disc herniation and lumbar spinal stenosis. Currently, the application of computer deep learning technology has demonstrated promising results in clinical diagnosis and treatment. The report aims to first describe a deep learning-based multi-element identification system for the visual field in percutaneous endoscopic spine surgery and to evaluate its feasibility. We established an image database by collecting surgical videos from 80 patients diagnosed with lumbar disc herniation and lumbar spinal stenosis, which were labeled by two spinal surgeons. We selected 10000 images of the visual field of percutaneous endoscopic spine surgery (including various tissue structures and surgical instruments), divided into the training data, validation data, and test data according to 3:1:1. We developed neural network models based on instance segmentation - VMamba, Mask RCNN, HIRI-ViT-B. Mean average precision (mAP) and frames per second (FPS) were used to measure the performance of each model for classification ,localization and recognition in real-time, and AP (average) is used to evaluate how easily an element is detected by neural networks based on computer deep learning. Combining the structural characteristics and performance comparison of the various types of models, the results from the test dataset show that VMamba (SSM) performs best in image boundary box detection (mAP = 79.1%) and contour segmentation (mAP = 81.6%), while HIRI-ViT-B is faster in real-time image processing (FPS = 42.7). Combining the average precision of the elements in the bounding box test and segmentation tasks in each network, the AP(average) was highest for tool 3 (bbox-0.91,segm-0.89) and lowest for tool 5 (bbox-0.80,segm-0.74) in the instrumentation. Among the tissue elements, the accuracy of bounding box detection and contour segmentation was highest for the ligamentum flavum (bbox-0.80,segm-0.75), and lowest for extra-dural fat (bbox-0.57,segm-0.54). This study creates the first instance segmentation-based dataset focusing on multiple elements (anatomical tissue, surgical instruments)within the field of view of spinal endoscopic surgery, and integrate computer vision intelligence with spinal endoscopic surgery by developing various neural networks to recognize, classify, and segment the target elements in the dataset, tracking the whole operation. By comparing three models - VMamba, Mask RCNN, HIRI-ViT-B, we recommended VMamba (SSM) model for the intraoperative real-time assistance system for spinal endoscopic operation.

查看原文本刊更多论文

基于深度学习的经皮内窥镜脊柱手术多元素识别系统：神经网络模型的开发与比较评价

随着医疗设备和技术更新的加快，脊柱疾病的手术选择从传统的开放手术逐渐发展到现在的各种微创内镜手术。其中，经皮内窥镜椎间盘切除术是治疗腰椎间盘突出症和腰椎管狭窄症的主要手术之一。目前，计算机深度学习技术在临床诊断和治疗方面的应用已经显示出良好的效果。本报告旨在首先描述一种基于深度学习的用于经皮内窥镜脊柱手术视野的多元素识别系统，并评估其可行性。我们收集了80例诊断为腰椎间盘突出和腰椎管狭窄的患者的手术视频，并由两名脊柱外科医生进行标记，建立了图像数据库。我们选取10000张经皮内窥镜脊柱手术视野图像（包括各种组织结构和手术器械），按照3:1:1的比例分为训练数据、验证数据和测试数据。我们开发了基于实例分割的神经网络模型——vammba、Mask RCNN、hiri - vitb。使用平均精度（mAP）和每秒帧数（FPS）来衡量每个模型在实时分类、定位和识别方面的性能，使用平均精度（AP）来评估基于计算机深度学习的神经网络检测元素的容易程度。结合各类模型的结构特点和性能对比，测试数据集的结果表明，vamba （SSM）在图像边界盒检测（mAP = 79.1%）和轮廓分割（mAP = 81.6%）方面表现最好，而hri - vitb在实时图像处理方面表现更快（FPS = 42.7）。结合每个网络中边界盒测试和分割任务中元素的平均精度，工具3的AP（平均）最高（bbox-0.91, segment -0.89），工具5的AP（平均）最低（bbox-0.80, segment -0.74）。在组织元素中，黄韧带的边界盒检测和轮廓分割准确率最高（bbox-0.80，段数-0.75），硬膜外脂肪的边界盒检测和轮廓分割准确率最低（bbox-0.57，段数-0.54）。本研究创建了首个基于实例分割的脊柱内窥镜手术视场内多元素（解剖组织、手术器械）数据集，并将计算机视觉智能与脊柱内窥镜手术相结合，通过开发各种神经网络对数据集中的目标元素进行识别、分类和分割，跟踪整个手术过程。通过比较VMamba、Mask RCNN、hiri - vitb三种模型，我们推荐VMamba （SSM）模型用于脊柱内镜手术的术中实时辅助系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.