头颈部肿瘤分期的大语言模型的证实。

IF 3.3 3区 医学 Q1 MEDICINE, GENERAL & INTERNAL
Mehmet Kayaalp, Hatice Bölek, Hatime Arzu Yaşar
{"title":"头颈部肿瘤分期的大语言模型的证实。","authors":"Mehmet Kayaalp, Hatice Bölek, Hatime Arzu Yaşar","doi":"10.3390/diagnostics15182375","DOIUrl":null,"url":null,"abstract":"<p><p><b>Background/Objectives</b>: Head and neck cancer (HNC) is a heterogeneous group of malignancies in which staging plays a critical role in guiding treatment and prognosis. Large language models (LLMs) such as ChatGPT, DeepSeek, and Grok have emerged as potential tools in oncology, yet their clinical applicability in staging remains unclear. This study aimed to evaluate the accuracy and concordance of LLMs compared to clinician-assigned staging in patients with HNC. <b>Methods</b>: The medical records of 202 patients with HNC, who presented to our center between 1 January 2010 and 13 February 2025, were retrospectively reviewed. The information obtained from the hospital information system by a junior researcher was re-evaluated by a senior researcher, and standard staging was completed. Except for the stage itself, the data used for staging were provided to a blinded third researcher, who then entered them into the ChatGPT, DeepSeek, and Grok applications with a staging command. After all staging processes were completed, the data were compiled, and clinician-assigned stages were compared with those generated by the LLMs. <b>Results</b>: The majority of the patients had laryngeal (45.5%) and nasopharyngeal cancer (21.3%). Definitive surgery was performed in 39.6% of the patients. Stage 4 was the most common stage among the patients (54%). The overall concordance rates, Cohen's kappa values, and F1 scores were 85.6%, 0.797, and 0.84 for ChatGPT; 67.3%, 0.522, and 0.65 for DeepSeek; and 75.2%, 0.614, and 0.72 for Grok, respectively, with no statistically significant differences between models. Pathological and surgical staging were found to be similar in terms of concordance. The concordance of assessments utilizing only imaging, only pathology notes, only physical examination notes, and comprehensive information was evaluated, revealing no significant differences. <b>Conclusions</b>: Large language models (LLMs) demonstrate relatively high accuracy in staging HNC. With careful implementation and with the consideration of prospective studies, these models have the potential to become valuable tools in oncology practice.</p>","PeriodicalId":11225,"journal":{"name":"Diagnostics","volume":"15 18","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468830/pdf/","citationCount":"0","resultStr":"{\"title\":\"Confirmation of Large Language Models in Head and Neck Cancer Staging.\",\"authors\":\"Mehmet Kayaalp, Hatice Bölek, Hatime Arzu Yaşar\",\"doi\":\"10.3390/diagnostics15182375\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b>Background/Objectives</b>: Head and neck cancer (HNC) is a heterogeneous group of malignancies in which staging plays a critical role in guiding treatment and prognosis. Large language models (LLMs) such as ChatGPT, DeepSeek, and Grok have emerged as potential tools in oncology, yet their clinical applicability in staging remains unclear. This study aimed to evaluate the accuracy and concordance of LLMs compared to clinician-assigned staging in patients with HNC. <b>Methods</b>: The medical records of 202 patients with HNC, who presented to our center between 1 January 2010 and 13 February 2025, were retrospectively reviewed. The information obtained from the hospital information system by a junior researcher was re-evaluated by a senior researcher, and standard staging was completed. Except for the stage itself, the data used for staging were provided to a blinded third researcher, who then entered them into the ChatGPT, DeepSeek, and Grok applications with a staging command. After all staging processes were completed, the data were compiled, and clinician-assigned stages were compared with those generated by the LLMs. <b>Results</b>: The majority of the patients had laryngeal (45.5%) and nasopharyngeal cancer (21.3%). Definitive surgery was performed in 39.6% of the patients. Stage 4 was the most common stage among the patients (54%). The overall concordance rates, Cohen's kappa values, and F1 scores were 85.6%, 0.797, and 0.84 for ChatGPT; 67.3%, 0.522, and 0.65 for DeepSeek; and 75.2%, 0.614, and 0.72 for Grok, respectively, with no statistically significant differences between models. Pathological and surgical staging were found to be similar in terms of concordance. The concordance of assessments utilizing only imaging, only pathology notes, only physical examination notes, and comprehensive information was evaluated, revealing no significant differences. <b>Conclusions</b>: Large language models (LLMs) demonstrate relatively high accuracy in staging HNC. With careful implementation and with the consideration of prospective studies, these models have the potential to become valuable tools in oncology practice.</p>\",\"PeriodicalId\":11225,\"journal\":{\"name\":\"Diagnostics\",\"volume\":\"15 18\",\"pages\":\"\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468830/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Diagnostics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3390/diagnostics15182375\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3390/diagnostics15182375","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

摘要

背景/目的:头颈癌(HNC)是一种异质性的恶性肿瘤,其分期对指导治疗和预后起着至关重要的作用。ChatGPT、DeepSeek和Grok等大型语言模型(llm)已成为肿瘤学的潜在工具,但它们在分期方面的临床适用性尚不清楚。本研究旨在评估llm与HNC患者临床分配分期的准确性和一致性。方法:回顾性分析2010年1月1日至2025年2月13日本中心收治的202例HNC患者的病历。初级研究员从医院信息系统获得的信息由高级研究员重新评估,完成标准分期。除了阶段本身,用于阶段的数据被提供给盲眼的第三个研究人员,然后他将这些数据输入ChatGPT、DeepSeek和Grok应用程序,并使用分期命令。在所有分期过程完成后,对数据进行汇编,并将临床医生指定的分期与llm生成的分期进行比较。结果:以喉癌(45.5%)和鼻咽癌(21.3%)为主。39.6%的患者进行了最终手术。第4期是患者中最常见的阶段(54%)。ChatGPT的总体一致性率、Cohen’s kappa值和F1得分分别为85.6%、0.797和0.84;DeepSeek为67.3%,0.522,0.65;Grok分别为75.2%、0.614、0.72,模型间差异无统计学意义。病理和手术分期在一致性方面发现相似。仅利用影像学、病理记录、体检记录和综合信息评估的一致性进行评估,没有发现显著差异。结论:大型语言模型(llm)在HNC分期中具有较高的准确性。通过仔细实施和前瞻性研究的考虑,这些模型有可能成为肿瘤学实践中有价值的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Confirmation of Large Language Models in Head and Neck Cancer Staging.

Background/Objectives: Head and neck cancer (HNC) is a heterogeneous group of malignancies in which staging plays a critical role in guiding treatment and prognosis. Large language models (LLMs) such as ChatGPT, DeepSeek, and Grok have emerged as potential tools in oncology, yet their clinical applicability in staging remains unclear. This study aimed to evaluate the accuracy and concordance of LLMs compared to clinician-assigned staging in patients with HNC. Methods: The medical records of 202 patients with HNC, who presented to our center between 1 January 2010 and 13 February 2025, were retrospectively reviewed. The information obtained from the hospital information system by a junior researcher was re-evaluated by a senior researcher, and standard staging was completed. Except for the stage itself, the data used for staging were provided to a blinded third researcher, who then entered them into the ChatGPT, DeepSeek, and Grok applications with a staging command. After all staging processes were completed, the data were compiled, and clinician-assigned stages were compared with those generated by the LLMs. Results: The majority of the patients had laryngeal (45.5%) and nasopharyngeal cancer (21.3%). Definitive surgery was performed in 39.6% of the patients. Stage 4 was the most common stage among the patients (54%). The overall concordance rates, Cohen's kappa values, and F1 scores were 85.6%, 0.797, and 0.84 for ChatGPT; 67.3%, 0.522, and 0.65 for DeepSeek; and 75.2%, 0.614, and 0.72 for Grok, respectively, with no statistically significant differences between models. Pathological and surgical staging were found to be similar in terms of concordance. The concordance of assessments utilizing only imaging, only pathology notes, only physical examination notes, and comprehensive information was evaluated, revealing no significant differences. Conclusions: Large language models (LLMs) demonstrate relatively high accuracy in staging HNC. With careful implementation and with the consideration of prospective studies, these models have the potential to become valuable tools in oncology practice.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Diagnostics
Diagnostics Biochemistry, Genetics and Molecular Biology-Clinical Biochemistry
CiteScore
4.70
自引率
8.30%
发文量
2699
审稿时长
19.64 days
期刊介绍: Diagnostics (ISSN 2075-4418) is an international scholarly open access journal on medical diagnostics. It publishes original research articles, reviews, communications and short notes on the research and development of medical diagnostics. There is no restriction on the length of the papers. Our aim is to encourage scientists to publish their experimental and theoretical research in as much detail as possible. Full experimental and/or methodological details must be provided for research articles.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信