Deep learning-based multi-element identification system for percutaneous endoscopic spine surgery: development and comparative evaluation of neural network models
IF 3.5 2区 计算机科学Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jinhui Bu, Yari Wang, Jiaqi Zhao, Sen Huang, Jun Liang, Zhenfei Wang, Long Xu, Yan Lei, Bo He, Minghui Dong, Guangpu Liu, Ru Niu, Chao Ma, Guangwang Liu
{"title":"Deep learning-based multi-element identification system for percutaneous endoscopic spine surgery: development and comparative evaluation of neural network models","authors":"Jinhui Bu, Yari Wang, Jiaqi Zhao, Sen Huang, Jun Liang, Zhenfei Wang, Long Xu, Yan Lei, Bo He, Minghui Dong, Guangpu Liu, Ru Niu, Chao Ma, Guangwang Liu","doi":"10.1007/s10489-025-06641-9","DOIUrl":null,"url":null,"abstract":"<div><p>As the update of medical equipment and technology accelerates, the surgical options of spinal disease have gradually developed from the traditional open surgery to the present various minimally invasive endoscopic surgery. Among these, percutaneous endoscopic discectomy stands out as one of the main procedures for treating lumbar disc herniation and lumbar spinal stenosis. Currently, the application of computer deep learning technology has demonstrated promising results in clinical diagnosis and treatment. The report aims to first describe a deep learning-based multi-element identification system for the visual field in percutaneous endoscopic spine surgery and to evaluate its feasibility. We established an image database by collecting surgical videos from 80 patients diagnosed with lumbar disc herniation and lumbar spinal stenosis, which were labeled by two spinal surgeons. We selected 10000 images of the visual field of percutaneous endoscopic spine surgery (including various tissue structures and surgical instruments), divided into the training data, validation data, and test data according to 3:1:1. We developed neural network models based on instance segmentation - VMamba, Mask RCNN, HIRI-ViT-B. Mean average precision (mAP) and frames per second (FPS) were used to measure the performance of each model for classification ,localization and recognition in real-time, and AP (average) is used to evaluate how easily an element is detected by neural networks based on computer deep learning. Combining the structural characteristics and performance comparison of the various types of models, the results from the test dataset show that VMamba (SSM) performs best in image boundary box detection (mAP = 79.1%) and contour segmentation (mAP = 81.6%), while HIRI-ViT-B is faster in real-time image processing (FPS = 42.7). Combining the average precision of the elements in the bounding box test and segmentation tasks in each network, the AP(average) was highest for tool 3 (bbox-0.91,segm-0.89) and lowest for tool 5 (bbox-0.80,segm-0.74) in the instrumentation. Among the tissue elements, the accuracy of bounding box detection and contour segmentation was highest for the ligamentum flavum (bbox-0.80,segm-0.75), and lowest for extra-dural fat (bbox-0.57,segm-0.54). This study creates the first instance segmentation-based dataset focusing on multiple elements (anatomical tissue, surgical instruments)within the field of view of spinal endoscopic surgery, and integrate computer vision intelligence with spinal endoscopic surgery by developing various neural networks to recognize, classify, and segment the target elements in the dataset, tracking the whole operation. By comparing three models - VMamba, Mask RCNN, HIRI-ViT-B, we recommended VMamba (SSM) model for the intraoperative real-time assistance system for spinal endoscopic operation.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 11","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06641-9","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
As the update of medical equipment and technology accelerates, the surgical options of spinal disease have gradually developed from the traditional open surgery to the present various minimally invasive endoscopic surgery. Among these, percutaneous endoscopic discectomy stands out as one of the main procedures for treating lumbar disc herniation and lumbar spinal stenosis. Currently, the application of computer deep learning technology has demonstrated promising results in clinical diagnosis and treatment. The report aims to first describe a deep learning-based multi-element identification system for the visual field in percutaneous endoscopic spine surgery and to evaluate its feasibility. We established an image database by collecting surgical videos from 80 patients diagnosed with lumbar disc herniation and lumbar spinal stenosis, which were labeled by two spinal surgeons. We selected 10000 images of the visual field of percutaneous endoscopic spine surgery (including various tissue structures and surgical instruments), divided into the training data, validation data, and test data according to 3:1:1. We developed neural network models based on instance segmentation - VMamba, Mask RCNN, HIRI-ViT-B. Mean average precision (mAP) and frames per second (FPS) were used to measure the performance of each model for classification ,localization and recognition in real-time, and AP (average) is used to evaluate how easily an element is detected by neural networks based on computer deep learning. Combining the structural characteristics and performance comparison of the various types of models, the results from the test dataset show that VMamba (SSM) performs best in image boundary box detection (mAP = 79.1%) and contour segmentation (mAP = 81.6%), while HIRI-ViT-B is faster in real-time image processing (FPS = 42.7). Combining the average precision of the elements in the bounding box test and segmentation tasks in each network, the AP(average) was highest for tool 3 (bbox-0.91,segm-0.89) and lowest for tool 5 (bbox-0.80,segm-0.74) in the instrumentation. Among the tissue elements, the accuracy of bounding box detection and contour segmentation was highest for the ligamentum flavum (bbox-0.80,segm-0.75), and lowest for extra-dural fat (bbox-0.57,segm-0.54). This study creates the first instance segmentation-based dataset focusing on multiple elements (anatomical tissue, surgical instruments)within the field of view of spinal endoscopic surgery, and integrate computer vision intelligence with spinal endoscopic surgery by developing various neural networks to recognize, classify, and segment the target elements in the dataset, tracking the whole operation. By comparing three models - VMamba, Mask RCNN, HIRI-ViT-B, we recommended VMamba (SSM) model for the intraoperative real-time assistance system for spinal endoscopic operation.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.