Rushmin Khazanchi, Austin R Chen, Parth Desai, Daniel Herrera, Jacob R Staub, Matthew A Follett, Mykhaylo Krushelnytskyy, Hanna Kemeny, Wellington K Hsu, Alpesh A Patel, Srikanth N Divi
{"title":"评估大型语言模型将腰椎成像报告简化为面向患者的文本的能力:GPT-4的试点研究。","authors":"Rushmin Khazanchi, Austin R Chen, Parth Desai, Daniel Herrera, Jacob R Staub, Matthew A Follett, Mykhaylo Krushelnytskyy, Hanna Kemeny, Wellington K Hsu, Alpesh A Patel, Srikanth N Divi","doi":"10.1007/s00256-025-05027-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To assess the ability of large language models (LLMs) to accurately simplify lumbar spine magnetic resonance imaging (MRI) reports.</p><p><strong>Materials and methods: </strong>Patients who underwent lumbar decompression and/or fusion surgery in 2022 at one tertiary academic medical center were queried using appropriate CPT codes. We then identified all patients with a preoperative ICD diagnosis of lumbar spondylolisthesis and extracted the latest preoperative spine MRI radiology report text. The GPT-4 API was deployed on deidentified reports with a prompt to produce translations and evaluated for accuracy and readability. An enhanced GPT prompt was constructed using high-scoring reports and evaluated on low-scoring reports.</p><p><strong>Results: </strong>Of 93 included reports, GPT effectively reduced the average reading level (11.47 versus 8.50, p < 0.001). While most reports had no accuracy issues, 34% of translations omitted at least one clinically relevant piece of information, while 6% produced a clinically significant inaccuracy in the translation. An enhanced prompt model using high scoring reports-maintained reading level while significantly improving omission rate (p < 0.0001). However, even in the enhanced prompt model, GPT made several errors regarding location of stenosis, description of prior spine surgery, and description of other spine pathologies.</p><p><strong>Conclusion: </strong>GPT-4 effectively simplifies the reading level of lumbar spine MRI reports. The model tends to omit key information in its translations, which can be mitigated with enhanced prompting. Further validation in the domain of spine radiology needs to be performed to facilitate clinical integration.</p>","PeriodicalId":21783,"journal":{"name":"Skeletal Radiology","volume":" ","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing the ability of large language models to simplify lumbar spine imaging reports into patient-facing text: a pilot study of GPT-4.\",\"authors\":\"Rushmin Khazanchi, Austin R Chen, Parth Desai, Daniel Herrera, Jacob R Staub, Matthew A Follett, Mykhaylo Krushelnytskyy, Hanna Kemeny, Wellington K Hsu, Alpesh A Patel, Srikanth N Divi\",\"doi\":\"10.1007/s00256-025-05027-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Objective: </strong>To assess the ability of large language models (LLMs) to accurately simplify lumbar spine magnetic resonance imaging (MRI) reports.</p><p><strong>Materials and methods: </strong>Patients who underwent lumbar decompression and/or fusion surgery in 2022 at one tertiary academic medical center were queried using appropriate CPT codes. We then identified all patients with a preoperative ICD diagnosis of lumbar spondylolisthesis and extracted the latest preoperative spine MRI radiology report text. The GPT-4 API was deployed on deidentified reports with a prompt to produce translations and evaluated for accuracy and readability. An enhanced GPT prompt was constructed using high-scoring reports and evaluated on low-scoring reports.</p><p><strong>Results: </strong>Of 93 included reports, GPT effectively reduced the average reading level (11.47 versus 8.50, p < 0.001). While most reports had no accuracy issues, 34% of translations omitted at least one clinically relevant piece of information, while 6% produced a clinically significant inaccuracy in the translation. An enhanced prompt model using high scoring reports-maintained reading level while significantly improving omission rate (p < 0.0001). However, even in the enhanced prompt model, GPT made several errors regarding location of stenosis, description of prior spine surgery, and description of other spine pathologies.</p><p><strong>Conclusion: </strong>GPT-4 effectively simplifies the reading level of lumbar spine MRI reports. The model tends to omit key information in its translations, which can be mitigated with enhanced prompting. Further validation in the domain of spine radiology needs to be performed to facilitate clinical integration.</p>\",\"PeriodicalId\":21783,\"journal\":{\"name\":\"Skeletal Radiology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Skeletal Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s00256-025-05027-9\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ORTHOPEDICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Skeletal Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00256-025-05027-9","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
Assessing the ability of large language models to simplify lumbar spine imaging reports into patient-facing text: a pilot study of GPT-4.
Objective: To assess the ability of large language models (LLMs) to accurately simplify lumbar spine magnetic resonance imaging (MRI) reports.
Materials and methods: Patients who underwent lumbar decompression and/or fusion surgery in 2022 at one tertiary academic medical center were queried using appropriate CPT codes. We then identified all patients with a preoperative ICD diagnosis of lumbar spondylolisthesis and extracted the latest preoperative spine MRI radiology report text. The GPT-4 API was deployed on deidentified reports with a prompt to produce translations and evaluated for accuracy and readability. An enhanced GPT prompt was constructed using high-scoring reports and evaluated on low-scoring reports.
Results: Of 93 included reports, GPT effectively reduced the average reading level (11.47 versus 8.50, p < 0.001). While most reports had no accuracy issues, 34% of translations omitted at least one clinically relevant piece of information, while 6% produced a clinically significant inaccuracy in the translation. An enhanced prompt model using high scoring reports-maintained reading level while significantly improving omission rate (p < 0.0001). However, even in the enhanced prompt model, GPT made several errors regarding location of stenosis, description of prior spine surgery, and description of other spine pathologies.
Conclusion: GPT-4 effectively simplifies the reading level of lumbar spine MRI reports. The model tends to omit key information in its translations, which can be mitigated with enhanced prompting. Further validation in the domain of spine radiology needs to be performed to facilitate clinical integration.
期刊介绍:
Skeletal Radiology provides a forum for the dissemination of current knowledge and information dealing with disorders of the musculoskeletal system including the spine. While emphasizing the radiological aspects of the many varied skeletal abnormalities, the journal also adopts an interdisciplinary approach, reflecting the membership of the International Skeletal Society. Thus, the anatomical, pathological, physiological, clinical, metabolic and epidemiological aspects of the many entities affecting the skeleton receive appropriate consideration.
This is the Journal of the International Skeletal Society and the Official Journal of the Society of Skeletal Radiology and the Australasian Musculoskelelal Imaging Group.