GPT4LFS (generative pretrained transformer 4 omni for lumbar foramina stenosis): enhancing lumbar foraminal stenosis image classification through large multimodal models.

IF 4.9 1区医学 Q1 CLINICAL NEUROLOGY

Spine Journal Pub Date : 2025-03-27 DOI:10.1016/j.spinee.2025.03.011

Elzat Elham-Yilizati Yilihamu, Fan-Shuo Zeng, Jun Shang, Jin-Tao Yang, Hai Zhong, Shi-Qing Feng

{"title":"GPT4LFS (generative pretrained transformer 4 omni for lumbar foramina stenosis): enhancing lumbar foraminal stenosis image classification through large multimodal models.","authors":"Elzat Elham-Yilizati Yilihamu, Fan-Shuo Zeng, Jun Shang, Jin-Tao Yang, Hai Zhong, Shi-Qing Feng","doi":"10.1016/j.spinee.2025.03.011","DOIUrl":null,"url":null,"abstract":"Background context: Lumbar foraminal stenosis (LFS) is a common spinal condition that requires accurate assessment. Current magnetic resonance imaging (MRI) reporting processes are often inefficient, and while deep learning has potential for improvement, challenges in generalization and interpretability limit its diagnostic effectiveness compared to physician expertise.Purpose: The present study aimed to leverage a multimodal large language model to improve the accuracy and efficiency of LFS image classification, thereby enabling rapid and precise automated diagnosis, reducing the dependence on manually annotated data, and enhancing diagnostic efficiency.Study design/setting: Retrospective study conducted from April 2017 to March 2023.Patient sample: Sagittal T1-weighted MRI data for the lumbar spine were collected from 1,200 patients across 3 medical centers. A total of 810 patient cases were included in the final analysis, with data collected from 7 different MRI devices.Outcome measures: Automated classification of LFS using the multi modal large language model. Accuracy, sensitivity, Specificity and Cohen's Kappa coefficient were calculated.Methods: An advanced multimodal fusion framework GPT4LFS was developed with the primary objective of integrating imaging data and natural language descriptions to comprehensively capture the complex LFS features. The model employed a pretrained ConvNeXt as the image processing module for extracting high-dimensional imaging features. Concurrently, medical descriptive texts generated by the multimodal large language model GPT-4o and encoded and feature-extracted using RoBERTa were utilized to optimize the model's contextual understanding capabilities. The Mamba architecture was implemented during the feature fusion stage, effectively integrating imaging and textual features and thereby enhancing the performance of the classification task. Finally, the stability of the model's detection results was validated by evaluating classification task metrics, such as the accuracy, sensitivity, specificity, and Kappa coefficients.Results: The training set comprised 6,299 images from 635 patients, the internal test set included 820 images from 82 patients, and the external test set was composed of 930 images from 93 patients. The GPT4LFS model demonstrated an overall accuracy of 93.7%, sensitivity of 95.8%, and specificity of 94.5% in the internal test set (Kappa=0.89, 95% confidence interval (CI): 0.84-0.96, p<.001). In the external test set, the overall accuracy was 92.2%, with a sensitivity of 92.2% and a specificity of 97.4% (Kappa=0.88, 95% CI: 0.84-0.89, p<.001). Both the internal and external test sets showed excellent consistency in the model. The code is freely accessible on GitHub at the following repository: https://github.com/ElzatElham/GPT4LFS.Conclusions: Using the GPT4LFS model for LFS image categorization demonstrated accuracy and the capacity for feature description at a level commensurate with that of professional clinicians.","PeriodicalId":49484,"journal":{"name":"Spine Journal","volume":" ","pages":""},"PeriodicalIF":4.9000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spine Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.spinee.2025.03.011","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background context: Lumbar foraminal stenosis (LFS) is a common spinal condition that requires accurate assessment. Current magnetic resonance imaging (MRI) reporting processes are often inefficient, and while deep learning has potential for improvement, challenges in generalization and interpretability limit its diagnostic effectiveness compared to physician expertise.

Purpose: The present study aimed to leverage a multimodal large language model to improve the accuracy and efficiency of LFS image classification, thereby enabling rapid and precise automated diagnosis, reducing the dependence on manually annotated data, and enhancing diagnostic efficiency.

Study design/setting: Retrospective study conducted from April 2017 to March 2023.

Patient sample: Sagittal T1-weighted MRI data for the lumbar spine were collected from 1,200 patients across 3 medical centers. A total of 810 patient cases were included in the final analysis, with data collected from 7 different MRI devices.

Outcome measures: Automated classification of LFS using the multi modal large language model. Accuracy, sensitivity, Specificity and Cohen's Kappa coefficient were calculated.

Methods: An advanced multimodal fusion framework GPT4LFS was developed with the primary objective of integrating imaging data and natural language descriptions to comprehensively capture the complex LFS features. The model employed a pretrained ConvNeXt as the image processing module for extracting high-dimensional imaging features. Concurrently, medical descriptive texts generated by the multimodal large language model GPT-4o and encoded and feature-extracted using RoBERTa were utilized to optimize the model's contextual understanding capabilities. The Mamba architecture was implemented during the feature fusion stage, effectively integrating imaging and textual features and thereby enhancing the performance of the classification task. Finally, the stability of the model's detection results was validated by evaluating classification task metrics, such as the accuracy, sensitivity, specificity, and Kappa coefficients.

Results: The training set comprised 6,299 images from 635 patients, the internal test set included 820 images from 82 patients, and the external test set was composed of 930 images from 93 patients. The GPT4LFS model demonstrated an overall accuracy of 93.7%, sensitivity of 95.8%, and specificity of 94.5% in the internal test set (Kappa=0.89, 95% confidence interval (CI): 0.84-0.96, p<.001). In the external test set, the overall accuracy was 92.2%, with a sensitivity of 92.2% and a specificity of 97.4% (Kappa=0.88, 95% CI: 0.84-0.89, p<.001). Both the internal and external test sets showed excellent consistency in the model. The code is freely accessible on GitHub at the following repository: https://github.com/ElzatElham/GPT4LFS.

Conclusions: Using the GPT4LFS model for LFS image categorization demonstrated accuracy and the capacity for feature description at a level commensurate with that of professional clinicians.

查看原文本刊更多论文

GPT4LFS（生成式预训练的腰椎椎间孔狭窄变压器4 omni）：通过大型多模态模型增强腰椎椎间孔狭窄图像分类。

背景：腰椎椎间孔狭窄（LFS）是一种常见的脊柱疾病，需要准确的评估。目前的磁共振成像（MRI）报告过程通常效率低下，虽然深度学习有改进的潜力，但与医生的专业知识相比，泛化和可解释性方面的挑战限制了其诊断效果。目的：本研究旨在利用多模态大语言模型来提高LFS图像分类的准确性和效率，从而实现快速、精确的自动诊断，减少对人工标注数据的依赖，提高诊断效率。研究设计/设置：回顾性研究于2017年4月至2023年3月进行。患者样本：腰椎矢状面t1加权MRI数据来自三个医疗中心的1200名患者。最终分析共纳入810例患者，数据来自7种不同的MRI设备。结果测量：使用多模态大语言模型对LFS进行自动分类。计算准确率、灵敏度、特异性和Cohen’s Kappa系数。方法：开发了一种先进的多模态融合框架GPT4LFS，其主要目标是将成像数据与自然语言描述相结合，全面捕获复杂的LFS特征。该模型采用预训练的ConvNeXt作为图像处理模块，提取高维图像特征。同时，利用多模态大型语言模型gpt - 40生成的医学描述文本，并使用RoBERTa进行编码和特征提取，以优化模型的上下文理解能力。在特征融合阶段实现了Mamba架构，有效地整合了图像和文本特征，从而提高了分类任务的性能。最后，通过评估分类任务指标，如准确率、灵敏度、特异性和Kappa系数，验证模型检测结果的稳定性。结果：训练集包含635例患者的6299张图像，内部测试集包含82例患者的820张图像，外部测试集包含93例患者的930张图像。GPT4LFS模型在内部测试集中的总体准确率为93.7%，灵敏度为95.8%，特异性为94.5% （Kappa = 0.89,95%置信区间（CI）： 0.84-0.96）。结论：使用GPT4LFS模型进行LFS图像分类的准确性和特征描述能力与专业临床医生的水平相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Spine Journal 医学-临床神经学

CiteScore

8.20

自引率

6.70%

发文量

680

审稿时长

13.1 weeks

期刊介绍： The Spine Journal, the official journal of the North American Spine Society, is an international and multidisciplinary journal that publishes original, peer-reviewed articles on research and treatment related to the spine and spine care, including basic science and clinical investigations. It is a condition of publication that manuscripts submitted to The Spine Journal have not been published, and will not be simultaneously submitted or published elsewhere. The Spine Journal also publishes major reviews of specific topics by acknowledged authorities, technical notes, teaching editorials, and other special features, Letters to the Editor-in-Chief are encouraged.