GPT4LFS (generative pre-trained transformer 4 omni for lumbar foramina stenosis): enhancing lumbar foraminal stenosis image classification through large multimodal models.
Elzat Elham-Yilizati Yilihamu, Fan-Shuo Zeng, Jun Shang, Jin-Tao Yang, Hai Zhong, Shi-Qing Feng
{"title":"GPT4LFS (generative pre-trained transformer 4 omni for lumbar foramina stenosis): enhancing lumbar foraminal stenosis image classification through large multimodal models.","authors":"Elzat Elham-Yilizati Yilihamu, Fan-Shuo Zeng, Jun Shang, Jin-Tao Yang, Hai Zhong, Shi-Qing Feng","doi":"10.1016/j.spinee.2025.03.011","DOIUrl":null,"url":null,"abstract":"<p><strong>Background context: </strong>Lumbar foraminal stenosis (LFS) is a common spinal condition that requires accurate assessment. Current magnetic resonance imaging (MRI) reporting processes are often inefficient, and while deep learning has potential for improvement, challenges in generalization and interpretability limit its diagnostic effectiveness compared to physician expertise.</p><p><strong>Purpose: </strong>The present study aimed to leverage a multimodal large language model to improve the accuracy and efficiency of LFS image classification, thereby enabling rapid and precise automated diagnosis, reducing the dependence on manually annotated data, and enhancing diagnostic efficiency.</p><p><strong>Study design/setting: </strong>Retrospective study conducted from April 2017 to March 2023.</p><p><strong>Patient sample: </strong>Sagittal T1-weighted MRI data for the lumbar spine were collected from 1,200 patients across three medical centers. A total of 810 patient cases were included in the final analysis, with data collected from seven different MRI devices.</p><p><strong>Outcome measures: </strong>Automated classification of LFS using the multi modal large language model. Accuracy, sensitivity, Specificity and Cohen's Kappa coefficient were calculated.</p><p><strong>Methods: </strong>An advanced multimodal fusion framework GPT4LFS was developed with the primary objective of integrating imaging data and natural language descriptions to comprehensively capture the complex LFS features. The model employed a pre-trained ConvNeXt as the image processing module for extracting high-dimensional imaging features. Concurrently, medical descriptive texts generated by the multimodal large language model GPT-4o and encoded and feature-extracted using RoBERTa were utilized to optimize the model's contextual understanding capabilities. The Mamba architecture was implemented during the feature fusion stage, effectively integrating imaging and textual features and thereby enhancing the performance of the classification task. Finally, the stability of the model's detection results was validated by evaluating classification task metrics, such as the accuracy, sensitivity, specificity, and Kappa coefficients.</p><p><strong>Results: </strong>The training set comprised 6,299 images from 635 patients, the internal test set included 820 images from 82 patients, and the external test set was composed of 930 images from 93 patients. The GPT4LFS model demonstrated an overall accuracy of 93.7%, sensitivity of 95.8%, and specificity of 94.5% in the internal test set (Kappa = 0.89,95% confidence interval (CI): 0.84-0.96, p<.001). In the external test set, the overall accuracy was 92.2%, with a sensitivity of 92.2% and a specificity of 97.4% (Kappa = 0.88, 95% CI: 0.84-0.89, p<.001). Both the internal and external test sets showed excellent consistency in the model. After the article is published, we will make the full code publicly available on GitHub.</p><p><strong>Conclusions: </strong>Using the GPT4LFS model for LFS image categorization demonstrated accuracy and the capacity for feature description at a level commensurate with that of professional clinicians.</p>","PeriodicalId":49484,"journal":{"name":"Spine Journal","volume":" ","pages":""},"PeriodicalIF":4.9000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spine Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.spinee.2025.03.011","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background context: Lumbar foraminal stenosis (LFS) is a common spinal condition that requires accurate assessment. Current magnetic resonance imaging (MRI) reporting processes are often inefficient, and while deep learning has potential for improvement, challenges in generalization and interpretability limit its diagnostic effectiveness compared to physician expertise.
Purpose: The present study aimed to leverage a multimodal large language model to improve the accuracy and efficiency of LFS image classification, thereby enabling rapid and precise automated diagnosis, reducing the dependence on manually annotated data, and enhancing diagnostic efficiency.
Study design/setting: Retrospective study conducted from April 2017 to March 2023.
Patient sample: Sagittal T1-weighted MRI data for the lumbar spine were collected from 1,200 patients across three medical centers. A total of 810 patient cases were included in the final analysis, with data collected from seven different MRI devices.
Outcome measures: Automated classification of LFS using the multi modal large language model. Accuracy, sensitivity, Specificity and Cohen's Kappa coefficient were calculated.
Methods: An advanced multimodal fusion framework GPT4LFS was developed with the primary objective of integrating imaging data and natural language descriptions to comprehensively capture the complex LFS features. The model employed a pre-trained ConvNeXt as the image processing module for extracting high-dimensional imaging features. Concurrently, medical descriptive texts generated by the multimodal large language model GPT-4o and encoded and feature-extracted using RoBERTa were utilized to optimize the model's contextual understanding capabilities. The Mamba architecture was implemented during the feature fusion stage, effectively integrating imaging and textual features and thereby enhancing the performance of the classification task. Finally, the stability of the model's detection results was validated by evaluating classification task metrics, such as the accuracy, sensitivity, specificity, and Kappa coefficients.
Results: The training set comprised 6,299 images from 635 patients, the internal test set included 820 images from 82 patients, and the external test set was composed of 930 images from 93 patients. The GPT4LFS model demonstrated an overall accuracy of 93.7%, sensitivity of 95.8%, and specificity of 94.5% in the internal test set (Kappa = 0.89,95% confidence interval (CI): 0.84-0.96, p<.001). In the external test set, the overall accuracy was 92.2%, with a sensitivity of 92.2% and a specificity of 97.4% (Kappa = 0.88, 95% CI: 0.84-0.89, p<.001). Both the internal and external test sets showed excellent consistency in the model. After the article is published, we will make the full code publicly available on GitHub.
Conclusions: Using the GPT4LFS model for LFS image categorization demonstrated accuracy and the capacity for feature description at a level commensurate with that of professional clinicians.
期刊介绍:
The Spine Journal, the official journal of the North American Spine Society, is an international and multidisciplinary journal that publishes original, peer-reviewed articles on research and treatment related to the spine and spine care, including basic science and clinical investigations. It is a condition of publication that manuscripts submitted to The Spine Journal have not been published, and will not be simultaneously submitted or published elsewhere. The Spine Journal also publishes major reviews of specific topics by acknowledged authorities, technical notes, teaching editorials, and other special features, Letters to the Editor-in-Chief are encouraged.