Daniel Spitzl , Markus Mergen , Rickmer Braren , Lukas Endrös , Matthias Eiber , Lisa Steinhelfer
{"title":"从PET/CT报告中获得llm支持的乳腺癌分期:一项比较性能研究","authors":"Daniel Spitzl , Markus Mergen , Rickmer Braren , Lukas Endrös , Matthias Eiber , Lisa Steinhelfer","doi":"10.1016/j.ijmedinf.2025.106053","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Imaging reports are crucial in breast cancer management, with the tumor-node-metastasis (TNM) classification serving as a widely used model for assessing disease severity, guiding treatment decisions, and predicting patient outcomes. Large language models (LLMs) offer a potential solution by extracting standardized UICC TNM classifications and the corresponding UICC stage directly from existing PET/CT reports. This approach holds promise to enhance staging accuracy, streamline multidisciplinary discussions, and improve patient outcomes.</div></div><div><h3>Methods</h3><div>Here, we evaluated four LLMs—ChatGPT-4o, DeepSeek V3, Claude 3.5 Sonnet, and Gemini 2.0 Flash—for their capacity to determine TNM staging based on UICC/AJCC breast cancer guidelines. A total of 111 fictitious PET/CT reports were analyzed, and each model’s outputs were measured against expert-generated TNM classifications and stage categorizations.</div></div><div><h3>Results</h3><div>Among the tested models, Claude 3.5 Sonnet demonstrated superior F1 scores of 0.95%, 0.95%, 1.00% and 0.92% for T, N, M classification and UICC stage classification, respectively.</div></div><div><h3>Conclusions</h3><div>These findings underscore the ability of advanced natural language processing (NLP) technologies to support reliable cancer staging, potentially aiding clinicians. Despite the encouraging performance, prospective clinical trials and validation across diverse practice settings remain critical to confirming these preliminary outcomes. Nonetheless, this study highlights the promise of LLM-based systems in reinforcing the accuracy of oncologic workflows and lays the groundwork for broader adoption of AI-driven tools in breast cancer management.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":"204 ","pages":"Article 106053"},"PeriodicalIF":4.1000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLM-powered breast cancer staging from PET/CT reports: a comparative performance study\",\"authors\":\"Daniel Spitzl , Markus Mergen , Rickmer Braren , Lukas Endrös , Matthias Eiber , Lisa Steinhelfer\",\"doi\":\"10.1016/j.ijmedinf.2025.106053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>Imaging reports are crucial in breast cancer management, with the tumor-node-metastasis (TNM) classification serving as a widely used model for assessing disease severity, guiding treatment decisions, and predicting patient outcomes. Large language models (LLMs) offer a potential solution by extracting standardized UICC TNM classifications and the corresponding UICC stage directly from existing PET/CT reports. This approach holds promise to enhance staging accuracy, streamline multidisciplinary discussions, and improve patient outcomes.</div></div><div><h3>Methods</h3><div>Here, we evaluated four LLMs—ChatGPT-4o, DeepSeek V3, Claude 3.5 Sonnet, and Gemini 2.0 Flash—for their capacity to determine TNM staging based on UICC/AJCC breast cancer guidelines. A total of 111 fictitious PET/CT reports were analyzed, and each model’s outputs were measured against expert-generated TNM classifications and stage categorizations.</div></div><div><h3>Results</h3><div>Among the tested models, Claude 3.5 Sonnet demonstrated superior F1 scores of 0.95%, 0.95%, 1.00% and 0.92% for T, N, M classification and UICC stage classification, respectively.</div></div><div><h3>Conclusions</h3><div>These findings underscore the ability of advanced natural language processing (NLP) technologies to support reliable cancer staging, potentially aiding clinicians. Despite the encouraging performance, prospective clinical trials and validation across diverse practice settings remain critical to confirming these preliminary outcomes. Nonetheless, this study highlights the promise of LLM-based systems in reinforcing the accuracy of oncologic workflows and lays the groundwork for broader adoption of AI-driven tools in breast cancer management.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":\"204 \",\"pages\":\"Article 106053\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505625002709\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505625002709","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
LLM-powered breast cancer staging from PET/CT reports: a comparative performance study
Purpose
Imaging reports are crucial in breast cancer management, with the tumor-node-metastasis (TNM) classification serving as a widely used model for assessing disease severity, guiding treatment decisions, and predicting patient outcomes. Large language models (LLMs) offer a potential solution by extracting standardized UICC TNM classifications and the corresponding UICC stage directly from existing PET/CT reports. This approach holds promise to enhance staging accuracy, streamline multidisciplinary discussions, and improve patient outcomes.
Methods
Here, we evaluated four LLMs—ChatGPT-4o, DeepSeek V3, Claude 3.5 Sonnet, and Gemini 2.0 Flash—for their capacity to determine TNM staging based on UICC/AJCC breast cancer guidelines. A total of 111 fictitious PET/CT reports were analyzed, and each model’s outputs were measured against expert-generated TNM classifications and stage categorizations.
Results
Among the tested models, Claude 3.5 Sonnet demonstrated superior F1 scores of 0.95%, 0.95%, 1.00% and 0.92% for T, N, M classification and UICC stage classification, respectively.
Conclusions
These findings underscore the ability of advanced natural language processing (NLP) technologies to support reliable cancer staging, potentially aiding clinicians. Despite the encouraging performance, prospective clinical trials and validation across diverse practice settings remain critical to confirming these preliminary outcomes. Nonetheless, this study highlights the promise of LLM-based systems in reinforcing the accuracy of oncologic workflows and lays the groundwork for broader adoption of AI-driven tools in breast cancer management.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.