大型语言模型作为中国乳腺癌患者和专家咨询热线：横断面问卷研究。

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics Pub Date : 2025-05-27 DOI:10.2196/66429

Hui Liu, Jialun Peng, Lu Li, Ao Deng, XiangXin Huang, Guobing Yin, Haojun Luo

{"title":"大型语言模型作为中国乳腺癌患者和专家咨询热线：横断面问卷研究。","authors":"Hui Liu, Jialun Peng, Lu Li, Ao Deng, XiangXin Huang, Guobing Yin, Haojun Luo","doi":"10.2196/66429","DOIUrl":null,"url":null,"abstract":"Background: The disease burden of breast cancer is increasing in China. Guiding people to obtain accurate information on breast cancer and improving the public's health literacy are crucial for the early detection and timely treatment of breast cancer. Large language model (LLM) is a currently popular source of health information. However, the accuracy and practicality of the breast cancer-related information provided by LLMs have not yet been evaluated.Objective: This study aims to evaluate and compare the accuracy, practicality, and generalization-specificity of responses to breast cancer-related questions from two LLMs, ChatGPT and ERNIE Bot (EB).Methods: The questions asked to the LLMs consisted of a patient questionnaire and an expert questionnaire, each containing 15 questions. ChatGPT was queried in both Chinese and English, recorded as ChatGPT-Chinese (ChatGPT-C) and ChatGPT-English (ChatGPT-E) respectively, while EB was queried in Chinese. The accuracy, practicality, and generalization-specificity of each inquiry's responses were rated by a breast cancer multidisciplinary treatment team using Likert scales.Results: Overall, for both the patient and expert questionnaire, the accuracy and practicality of responses from ChatGPT-E were significantly higher than those from ChatGPT-C and EB (all Ps<.001). However, the responses from all LLMs are relatively generalized, leading to lower accuracy and practicality for the expert questionnaire compared to the patient questionnaire. Additionally, there were issues such as the lack of supporting evidence and potential ethical risks in the responses of LLMs.Conclusions: Currently, compared to other LLMs, ChatGPT-E has demonstrated greater potential for application in educating Chinese patients with breast cancer, and may serve as an effective tool for them to obtain health information. However, for breast cancer specialists, these LLMs are not yet suitable for assisting in clinical diagnosis or treatment activities. Additionally, data security, ethical, and legal risks associated with using LLMs in clinical practice cannot be ignored. In the future, further research is needed to determine the true efficacy of LLMs in clinical scenarios related to breast cancer in China.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"13 ","pages":"e66429"},"PeriodicalIF":3.1000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133073/pdf/","citationCount":"0","resultStr":"{\"title\":\"Large Language Models as a Consulting Hotline for Patients With Breast Cancer and Specialists in China: Cross-Sectional Questionnaire Study.\",\"authors\":\"Hui Liu, Jialun Peng, Lu Li, Ao Deng, XiangXin Huang, Guobing Yin, Haojun Luo\",\"doi\":\"10.2196/66429\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: The disease burden of breast cancer is increasing in China. Guiding people to obtain accurate information on breast cancer and improving the public's health literacy are crucial for the early detection and timely treatment of breast cancer. Large language model (LLM) is a currently popular source of health information. However, the accuracy and practicality of the breast cancer-related information provided by LLMs have not yet been evaluated.Objective: This study aims to evaluate and compare the accuracy, practicality, and generalization-specificity of responses to breast cancer-related questions from two LLMs, ChatGPT and ERNIE Bot (EB).Methods: The questions asked to the LLMs consisted of a patient questionnaire and an expert questionnaire, each containing 15 questions. ChatGPT was queried in both Chinese and English, recorded as ChatGPT-Chinese (ChatGPT-C) and ChatGPT-English (ChatGPT-E) respectively, while EB was queried in Chinese. The accuracy, practicality, and generalization-specificity of each inquiry's responses were rated by a breast cancer multidisciplinary treatment team using Likert scales.Results: Overall, for both the patient and expert questionnaire, the accuracy and practicality of responses from ChatGPT-E were significantly higher than those from ChatGPT-C and EB (all Ps<.001). However, the responses from all LLMs are relatively generalized, leading to lower accuracy and practicality for the expert questionnaire compared to the patient questionnaire. Additionally, there were issues such as the lack of supporting evidence and potential ethical risks in the responses of LLMs.Conclusions: Currently, compared to other LLMs, ChatGPT-E has demonstrated greater potential for application in educating Chinese patients with breast cancer, and may serve as an effective tool for them to obtain health information. However, for breast cancer specialists, these LLMs are not yet suitable for assisting in clinical diagnosis or treatment activities. Additionally, data security, ethical, and legal risks associated with using LLMs in clinical practice cannot be ignored. In the future, further research is needed to determine the true efficacy of LLMs in clinical scenarios related to breast cancer in China.\",\"PeriodicalId\":56334,\"journal\":{\"name\":\"JMIR Medical Informatics\",\"volume\":\"13 \",\"pages\":\"e66429\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133073/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/66429\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/66429","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

摘要

背景：在中国，乳腺癌的疾病负担正在增加。指导人们获得关于乳腺癌的准确信息，提高公众的卫生素养，对乳腺癌的早期发现和及时治疗至关重要。大型语言模型（LLM）是目前流行的健康信息来源。然而，法学硕士提供的乳腺癌相关信息的准确性和实用性尚未得到评估。目的：本研究旨在评估和比较ChatGPT和ERNIE Bot （EB）两名法学硕士对乳腺癌相关问题的反应的准确性、实用性和泛化特异性。方法：对法学硕士进行问卷调查，包括患者问卷和专家问卷，每个问卷包含15个问题。查询ChatGPT时使用中文和英文，分别记录为ChatGPT-Chinese （ChatGPT- c）和ChatGPT-English (ChatGPT- e)，查询EB时使用中文。一个乳腺癌多学科治疗小组使用李克特量表对每个询问的回答的准确性、实用性和普遍性进行了评分。结果：总体而言，无论是对患者问卷还是专家问卷，ChatGPT-E的回答准确性和实用性均显著高于ChatGPT-C和EB（均为结论）。结论：目前，与其他法学硕士相比，ChatGPT-E在教育中国乳腺癌患者方面显示出更大的应用潜力，可以作为其获取健康信息的有效工具。然而，对于乳腺癌专家来说，这些法学硕士还不适合协助临床诊断或治疗活动。此外，在临床实践中使用法学硕士相关的数据安全、伦理和法律风险也不容忽视。在未来，需要进一步的研究来确定LLMs在中国乳腺癌相关临床场景中的真实疗效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Large Language Models as a Consulting Hotline for Patients With Breast Cancer and Specialists in China: Cross-Sectional Questionnaire Study.

Background: The disease burden of breast cancer is increasing in China. Guiding people to obtain accurate information on breast cancer and improving the public's health literacy are crucial for the early detection and timely treatment of breast cancer. Large language model (LLM) is a currently popular source of health information. However, the accuracy and practicality of the breast cancer-related information provided by LLMs have not yet been evaluated.

Objective: This study aims to evaluate and compare the accuracy, practicality, and generalization-specificity of responses to breast cancer-related questions from two LLMs, ChatGPT and ERNIE Bot (EB).

Methods: The questions asked to the LLMs consisted of a patient questionnaire and an expert questionnaire, each containing 15 questions. ChatGPT was queried in both Chinese and English, recorded as ChatGPT-Chinese (ChatGPT-C) and ChatGPT-English (ChatGPT-E) respectively, while EB was queried in Chinese. The accuracy, practicality, and generalization-specificity of each inquiry's responses were rated by a breast cancer multidisciplinary treatment team using Likert scales.

Results: Overall, for both the patient and expert questionnaire, the accuracy and practicality of responses from ChatGPT-E were significantly higher than those from ChatGPT-C and EB (all Ps<.001). However, the responses from all LLMs are relatively generalized, leading to lower accuracy and practicality for the expert questionnaire compared to the patient questionnaire. Additionally, there were issues such as the lack of supporting evidence and potential ethical risks in the responses of LLMs.

Conclusions: Currently, compared to other LLMs, ChatGPT-E has demonstrated greater potential for application in educating Chinese patients with breast cancer, and may serve as an effective tool for them to obtain health information. However, for breast cancer specialists, these LLMs are not yet suitable for assisting in clinical diagnosis or treatment activities. Additionally, data security, ethical, and legal risks associated with using LLMs in clinical practice cannot be ignored. In the future, further research is needed to determine the true efficacy of LLMs in clinical scenarios related to breast cancer in China.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.