A novel deep learning system for automated diagnosis and grading of lumbar spinal stenosis based on spine MRI: model development and validation.

IF 3 2区 医学 Q2 CLINICAL NEUROLOGY
Tianyi Wang, Aobo Wang, Yiling Zhang, Xingyu Liu, Ning Fan, Shuo Yuan, Peng Du, Qichao Wu, Ruiyuan Chen, Yu Xi, Zhao Gu, Qi Fei, Lei Zang
{"title":"A novel deep learning system for automated diagnosis and grading of lumbar spinal stenosis based on spine MRI: model development and validation.","authors":"Tianyi Wang, Aobo Wang, Yiling Zhang, Xingyu Liu, Ning Fan, Shuo Yuan, Peng Du, Qichao Wu, Ruiyuan Chen, Yu Xi, Zhao Gu, Qi Fei, Lei Zang","doi":"10.3171/2025.4.FOCUS24670","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>The study aimed to develop a single-stage deep learning (DL) screening system for automated binary and multiclass grading of lumbar central stenosis (LCS), lateral recess stenosis (LRS), and lumbar foraminal stenosis (LFS).</p><p><strong>Methods: </strong>Consecutive inpatients who underwent lumbar MRI at our center were retrospectively reviewed for the internal dataset. Axial and sagittal lumbar MRI scans were collected. Based on a new MRI diagnostic criterion, all MRI studies were labeled by two spine specialists and calibrated by a third spine specialist to serve as reference standard. Furthermore, two spine clinicians labeled all MRI studies independently to compare interobserver reliability with the DL model. Samples were assigned into training, validation, and test sets at a proportion of 8:1:1. Additional patients from another center were enrolled as the external test dataset. A modified single-stage YOLOv5 network was designed for simultaneous detection of regions of interest (ROIs) and grading of LCS, LRS, and LFS. Quantitative evaluation metrics of exactitude and reliability for the model were computed.</p><p><strong>Results: </strong>In total, 420 and 50 patients were enrolled in the internal and external datasets. High recalls of 97.4%-99.8% were achieved for ROI detection of lumbar spinal stenosis (LSS). The system revealed multigrade area under curve (AUC) values of 0.93-0.97 in the internal test set and 0.85-0.94 in the external test set for LCS, LRS, and LFS. In binary grading, the DL model achieved high sensitivities of 0.97 for LCS, 0.98 for LRS, and 0.96 for LFS, slightly better than those achieved by spine clinicians in the internal test set. In the external test set, the binary sensitivities were 0.98 for LCS, 0.96 for LRS, and 0.95 for LFS. For reliability assessment, the kappa coefficients between the DL model and reference standard were 0.92, 0.88, and 0.91 for LCS, LRS, and LFS, respectively, slightly higher than those evaluated by nonexpert spine clinicians.</p><p><strong>Conclusions: </strong>The authors designed a novel DL system that demonstrated promising performance, especially in sensitivity, for automated diagnosis and grading of different types of lumbar spinal stenosis using spine MRI. The reliability of the system was better than that of spine surgeons. The authors' system may serve as a triage tool for LSS to reduce misdiagnosis and optimize routine processes in clinical work.</p>","PeriodicalId":19187,"journal":{"name":"Neurosurgical focus","volume":"59 1","pages":"E6"},"PeriodicalIF":3.0000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurosurgical focus","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3171/2025.4.FOCUS24670","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: The study aimed to develop a single-stage deep learning (DL) screening system for automated binary and multiclass grading of lumbar central stenosis (LCS), lateral recess stenosis (LRS), and lumbar foraminal stenosis (LFS).

Methods: Consecutive inpatients who underwent lumbar MRI at our center were retrospectively reviewed for the internal dataset. Axial and sagittal lumbar MRI scans were collected. Based on a new MRI diagnostic criterion, all MRI studies were labeled by two spine specialists and calibrated by a third spine specialist to serve as reference standard. Furthermore, two spine clinicians labeled all MRI studies independently to compare interobserver reliability with the DL model. Samples were assigned into training, validation, and test sets at a proportion of 8:1:1. Additional patients from another center were enrolled as the external test dataset. A modified single-stage YOLOv5 network was designed for simultaneous detection of regions of interest (ROIs) and grading of LCS, LRS, and LFS. Quantitative evaluation metrics of exactitude and reliability for the model were computed.

Results: In total, 420 and 50 patients were enrolled in the internal and external datasets. High recalls of 97.4%-99.8% were achieved for ROI detection of lumbar spinal stenosis (LSS). The system revealed multigrade area under curve (AUC) values of 0.93-0.97 in the internal test set and 0.85-0.94 in the external test set for LCS, LRS, and LFS. In binary grading, the DL model achieved high sensitivities of 0.97 for LCS, 0.98 for LRS, and 0.96 for LFS, slightly better than those achieved by spine clinicians in the internal test set. In the external test set, the binary sensitivities were 0.98 for LCS, 0.96 for LRS, and 0.95 for LFS. For reliability assessment, the kappa coefficients between the DL model and reference standard were 0.92, 0.88, and 0.91 for LCS, LRS, and LFS, respectively, slightly higher than those evaluated by nonexpert spine clinicians.

Conclusions: The authors designed a novel DL system that demonstrated promising performance, especially in sensitivity, for automated diagnosis and grading of different types of lumbar spinal stenosis using spine MRI. The reliability of the system was better than that of spine surgeons. The authors' system may serve as a triage tool for LSS to reduce misdiagnosis and optimize routine processes in clinical work.

一种基于脊柱MRI的腰椎管狭窄自动诊断和分级的新型深度学习系统:模型开发和验证。
目的:本研究旨在开发一种单阶段深度学习(DL)筛选系统,用于腰椎中央狭窄(LCS)、侧隐窝狭窄(LRS)和腰椎椎间孔狭窄(LFS)的自动二元和多级分级。方法:对在本中心连续接受腰椎MRI检查的住院患者的内部数据进行回顾性分析。收集腰椎轴位和矢状位MRI扫描。基于新的MRI诊断标准,所有MRI研究由两名脊柱专家标记,并由第三名脊柱专家校准,作为参考标准。此外,两位脊柱临床医生独立标记了所有MRI研究,以比较观察者之间与DL模型的可靠性。按8:1:1的比例将样本分配到训练集、验证集和测试集。来自另一个中心的其他患者被纳入外部测试数据集。设计了一种改进的单级YOLOv5网络,用于同时检测LCS、LRS和LFS的兴趣区域(roi)和分级。对模型的准确性和可靠性进行了定量评价。结果:共有420例和50例患者被纳入内部和外部数据集。腰椎管狭窄(LSS)的ROI检测回收率为97.4% ~ 99.8%。LCS、LRS和LFS的内部测试集和外部测试集的AUC值分别为0.93 ~ 0.97和0.85 ~ 0.94。在二元分级中,DL模型对LCS的灵敏度为0.97,对LRS的灵敏度为0.98,对LFS的灵敏度为0.96,略好于脊柱临床医生在内部测试集中的灵敏度。在外部测试集中,LCS的二元灵敏度为0.98,LRS为0.96,LFS为0.95。对于可靠性评估,LCS、LRS和LFS的DL模型和参考标准之间的kappa系数分别为0.92、0.88和0.91,略高于非脊柱专家临床医生的评估。结论:作者设计了一种新型的DL系统,该系统表现出了良好的性能,特别是在灵敏度上,可以使用脊柱MRI对不同类型的腰椎管狭窄进行自动诊断和分级。该系统的可靠性优于脊柱外科医生。作者的系统可以作为LSS的分诊工具,以减少误诊和优化临床工作的常规流程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurosurgical focus
Neurosurgical focus CLINICAL NEUROLOGY-SURGERY
CiteScore
6.30
自引率
0.00%
发文量
261
审稿时长
3 months
期刊介绍: Information not localized
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信