使用大型语言模型评估日本在线乳腺癌治疗信息的质量:ChatGPT、Claude和专家评估的比较

IF 2.9
Breast cancer (Tokyo, Japan) Pub Date : 2025-09-01 Epub Date: 2025-05-21 DOI:10.1007/s12282-025-01719-1
Atsushi Fushimi, Mitsuo Terada, Rie Tahara, Yuko Nakazawa, Madoka Iwase, Tomoko Shibayama, Samy Kotti, Nami Yamashita, Asumi Iesato
{"title":"使用大型语言模型评估日本在线乳腺癌治疗信息的质量:ChatGPT、Claude和专家评估的比较","authors":"Atsushi Fushimi, Mitsuo Terada, Rie Tahara, Yuko Nakazawa, Madoka Iwase, Tomoko Shibayama, Samy Kotti, Nami Yamashita, Asumi Iesato","doi":"10.1007/s12282-025-01719-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The internet is a primary source of health information for breast cancer patients, but online content quality varies widely. This study aimed to evaluate the capability of large language models (LLMs), including ChatGPT and Claude, to assess the quality of online Japanese breast cancer treatment information by calculating and comparing their DISCERN scores with those of expert raters.</p><p><strong>Methods: </strong>We analyzed 60 Japanese web pages on breast cancer treatments (surgery, chemotherapy, immunotherapy) using the DISCERN instrument. Each page was evaluated by the LLMs ChatGPT and Claude, along with two expert raters. We assessed LLMs evaluation consistency, correlations between LLMs and expert assessments, and relationships between DISCERN scores, Google search rankings, and content length.</p><p><strong>Results: </strong>Evaluations by LLMs showed high consistency and moderate to strong correlations with expert assessments (ChatGPT vs Expert: r = 0.65; Claude vs Expert: r = 0.68). LLMs assigned slightly higher scores than expert raters. Chemotherapy pages received the highest quality scores, followed by surgery and immunotherapy. We found a weak negative correlation between Google search ranking and DISCERN scores, and a moderate positive correlation (r = 0.45) between content length and quality ratings.</p><p><strong>Conclusions: </strong>This study demonstrates the potential of LLM-assisted evaluation in assessing online health information quality, while highlighting the importance of human expertise. LLMs could efficiently process large volumes of health information but should complement human insight for comprehensive assessments. These findings have implications for improving the accessibility and reliability of breast cancer treatment information.</p>","PeriodicalId":520574,"journal":{"name":"Breast cancer (Tokyo, Japan)","volume":" ","pages":"960-969"},"PeriodicalIF":2.9000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing the quality of Japanese online breast cancer treatment information using large language models: a comparison of ChatGPT, Claude, and expert evaluations.\",\"authors\":\"Atsushi Fushimi, Mitsuo Terada, Rie Tahara, Yuko Nakazawa, Madoka Iwase, Tomoko Shibayama, Samy Kotti, Nami Yamashita, Asumi Iesato\",\"doi\":\"10.1007/s12282-025-01719-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The internet is a primary source of health information for breast cancer patients, but online content quality varies widely. This study aimed to evaluate the capability of large language models (LLMs), including ChatGPT and Claude, to assess the quality of online Japanese breast cancer treatment information by calculating and comparing their DISCERN scores with those of expert raters.</p><p><strong>Methods: </strong>We analyzed 60 Japanese web pages on breast cancer treatments (surgery, chemotherapy, immunotherapy) using the DISCERN instrument. Each page was evaluated by the LLMs ChatGPT and Claude, along with two expert raters. We assessed LLMs evaluation consistency, correlations between LLMs and expert assessments, and relationships between DISCERN scores, Google search rankings, and content length.</p><p><strong>Results: </strong>Evaluations by LLMs showed high consistency and moderate to strong correlations with expert assessments (ChatGPT vs Expert: r = 0.65; Claude vs Expert: r = 0.68). LLMs assigned slightly higher scores than expert raters. Chemotherapy pages received the highest quality scores, followed by surgery and immunotherapy. We found a weak negative correlation between Google search ranking and DISCERN scores, and a moderate positive correlation (r = 0.45) between content length and quality ratings.</p><p><strong>Conclusions: </strong>This study demonstrates the potential of LLM-assisted evaluation in assessing online health information quality, while highlighting the importance of human expertise. LLMs could efficiently process large volumes of health information but should complement human insight for comprehensive assessments. These findings have implications for improving the accessibility and reliability of breast cancer treatment information.</p>\",\"PeriodicalId\":520574,\"journal\":{\"name\":\"Breast cancer (Tokyo, Japan)\",\"volume\":\" \",\"pages\":\"960-969\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Breast cancer (Tokyo, Japan)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s12282-025-01719-1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/5/21 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Breast cancer (Tokyo, Japan)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12282-025-01719-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/21 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:互联网是乳腺癌患者健康信息的主要来源,但在线内容质量参差不齐。本研究旨在评估大型语言模型(llm)的能力,包括ChatGPT和Claude,通过计算和比较他们的辨别分数与专家评分者的分数来评估在线日本乳腺癌治疗信息的质量。方法:我们使用DISCERN仪器分析60个关于乳腺癌治疗(手术、化疗、免疫治疗)的日本网页。每页都由法学硕士ChatGPT和Claude以及两位专家评分员进行评估。我们评估了法学硕士评估的一致性、法学硕士与专家评估之间的相关性,以及DISCERN分数、谷歌搜索排名和内容长度之间的关系。结果:法学硕士的评价与专家评价具有高一致性和中强相关性(ChatGPT vs expert: r = 0.65;Claude vs Expert: r = 0.68)。法学硕士的评分略高于专家评分者。化疗页面的质量得分最高,其次是手术和免疫治疗。我们发现谷歌搜索排名与DISCERN评分之间存在弱负相关,内容长度与质量评分之间存在中度正相关(r = 0.45)。结论:本研究证明了法学硕士辅助评估在评估在线健康信息质量方面的潜力,同时强调了人类专业知识的重要性。法学硕士可以有效地处理大量健康信息,但应该补充人类的洞察力进行全面评估。这些发现有助于提高乳腺癌治疗信息的可及性和可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Assessing the quality of Japanese online breast cancer treatment information using large language models: a comparison of ChatGPT, Claude, and expert evaluations.

Background: The internet is a primary source of health information for breast cancer patients, but online content quality varies widely. This study aimed to evaluate the capability of large language models (LLMs), including ChatGPT and Claude, to assess the quality of online Japanese breast cancer treatment information by calculating and comparing their DISCERN scores with those of expert raters.

Methods: We analyzed 60 Japanese web pages on breast cancer treatments (surgery, chemotherapy, immunotherapy) using the DISCERN instrument. Each page was evaluated by the LLMs ChatGPT and Claude, along with two expert raters. We assessed LLMs evaluation consistency, correlations between LLMs and expert assessments, and relationships between DISCERN scores, Google search rankings, and content length.

Results: Evaluations by LLMs showed high consistency and moderate to strong correlations with expert assessments (ChatGPT vs Expert: r = 0.65; Claude vs Expert: r = 0.68). LLMs assigned slightly higher scores than expert raters. Chemotherapy pages received the highest quality scores, followed by surgery and immunotherapy. We found a weak negative correlation between Google search ranking and DISCERN scores, and a moderate positive correlation (r = 0.45) between content length and quality ratings.

Conclusions: This study demonstrates the potential of LLM-assisted evaluation in assessing online health information quality, while highlighting the importance of human expertise. LLMs could efficiently process large volumes of health information but should complement human insight for comprehensive assessments. These findings have implications for improving the accessibility and reliability of breast cancer treatment information.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信