Zichen Ye, Bo Zhang, Kun Zhang, María José González Méndez, Huijiao Yan, Tong Wu, Yimin Qu, Yu Jiang, Peng Xue, Youlin Qiao
{"title":"评估 ChatGPT 对宫颈癌和乳腺癌常见问题的答复。","authors":"Zichen Ye, Bo Zhang, Kun Zhang, María José González Méndez, Huijiao Yan, Tong Wu, Yimin Qu, Yu Jiang, Peng Xue, Youlin Qiao","doi":"10.1186/s12905-024-03320-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Cervical cancer (CC) and breast cancer (BC) threaten women's well-being, influenced by health-related stigma and a lack of reliable information, which can cause late diagnosis and early death. ChatGPT is likely to become a key source of health information, although quality concerns could also influence health-seeking behaviours.</p><p><strong>Methods: </strong>This cross-sectional online survey compared ChatGPT's responses to five physicians specializing in mammography and five specializing in gynaecology. Twenty frequently asked questions about CC and BC were asked on 26th and 29th of April, 2023. A panel of seven experts assessed the accuracy, consistency, and relevance of ChatGPT's responses using a 7-point Likert scale. Responses were analyzed for readability, reliability, and efficiency. ChatGPT's responses were synthesized, and findings are presented as a radar chart.</p><p><strong>Results: </strong>ChatGPT had an accuracy score of 7.0 (range: 6.6-7.0) for CC and BC questions, surpassing the highest-scoring physicians (P < 0.05). ChatGPT took an average of 13.6 s (range: 7.6-24.0) to answer each of the 20 questions presented. Readability was comparable to that of experts and physicians involved, but ChatGPT generated more extended responses compared to physicians. The consistency of repeated answers was 5.2 (range: 3.4-6.7). With different contexts combined, the overall ChatGPT relevance score was 6.5 (range: 4.8-7.0). Radar plot analysis indicated comparably good accuracy, efficiency, and to a certain extent, relevance. However, there were apparent inconsistencies, and the reliability and readability be considered inadequate.</p><p><strong>Conclusions: </strong>ChatGPT shows promise as an initial source of information for CC and BC. ChatGPT is also highly functional and appears to be superior to physicians, and aligns with expert consensus, although there is room for improvement in readability, reliability, and consistency. Future efforts should focus on developing advanced ChatGPT models explicitly designed to improve medical practice and for those with concerns about symptoms.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11367894/pdf/","citationCount":"0","resultStr":"{\"title\":\"An assessment of ChatGPT's responses to frequently asked questions about cervical and breast cancer.\",\"authors\":\"Zichen Ye, Bo Zhang, Kun Zhang, María José González Méndez, Huijiao Yan, Tong Wu, Yimin Qu, Yu Jiang, Peng Xue, Youlin Qiao\",\"doi\":\"10.1186/s12905-024-03320-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Cervical cancer (CC) and breast cancer (BC) threaten women's well-being, influenced by health-related stigma and a lack of reliable information, which can cause late diagnosis and early death. ChatGPT is likely to become a key source of health information, although quality concerns could also influence health-seeking behaviours.</p><p><strong>Methods: </strong>This cross-sectional online survey compared ChatGPT's responses to five physicians specializing in mammography and five specializing in gynaecology. Twenty frequently asked questions about CC and BC were asked on 26th and 29th of April, 2023. A panel of seven experts assessed the accuracy, consistency, and relevance of ChatGPT's responses using a 7-point Likert scale. Responses were analyzed for readability, reliability, and efficiency. ChatGPT's responses were synthesized, and findings are presented as a radar chart.</p><p><strong>Results: </strong>ChatGPT had an accuracy score of 7.0 (range: 6.6-7.0) for CC and BC questions, surpassing the highest-scoring physicians (P < 0.05). ChatGPT took an average of 13.6 s (range: 7.6-24.0) to answer each of the 20 questions presented. Readability was comparable to that of experts and physicians involved, but ChatGPT generated more extended responses compared to physicians. The consistency of repeated answers was 5.2 (range: 3.4-6.7). With different contexts combined, the overall ChatGPT relevance score was 6.5 (range: 4.8-7.0). Radar plot analysis indicated comparably good accuracy, efficiency, and to a certain extent, relevance. However, there were apparent inconsistencies, and the reliability and readability be considered inadequate.</p><p><strong>Conclusions: </strong>ChatGPT shows promise as an initial source of information for CC and BC. ChatGPT is also highly functional and appears to be superior to physicians, and aligns with expert consensus, although there is room for improvement in readability, reliability, and consistency. Future efforts should focus on developing advanced ChatGPT models explicitly designed to improve medical practice and for those with concerns about symptoms.</p>\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11367894/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12905-024-03320-8\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12905-024-03320-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0
摘要
背景:宫颈癌(CC)和乳腺癌(BC)威胁着妇女的健康,与健康有关的污名化和缺乏可靠的信息会导致晚期诊断和早期死亡。ChatGPT 有可能成为一个重要的健康信息来源,尽管质量问题也会影响求医行为:这项横断面在线调查比较了 ChatGPT 与五位乳腺摄影专科医生和五位妇科专科医生的回复。调查于 2023 年 4 月 26 日和 29 日进行,共询问了 20 个有关 CC 和 BC 的常见问题。由七位专家组成的小组使用 7 点李克特量表对 ChatGPT 回答的准确性、一致性和相关性进行了评估。对回答的可读性、可靠性和效率进行了分析。对 ChatGPT 的回答进行了综合,结果以雷达图的形式呈现:结果:ChatGPT 在 CC 和 BC 问题上的准确率为 7.0 分(范围:6.6-7.0),超过了得分最高的医生(P 结论:ChatGPT 显示了作为初始工具的前景:ChatGPT 有望成为 CC 和 BC 的初始信息来源。ChatGPT 的功能性也很强,似乎优于医生,并与专家共识一致,但在可读性、可靠性和一致性方面仍有改进空间。未来的工作重点应该是开发先进的 ChatGPT 模型,明确用于改善医疗实践和那些对症状有疑虑的人。
An assessment of ChatGPT's responses to frequently asked questions about cervical and breast cancer.
Background: Cervical cancer (CC) and breast cancer (BC) threaten women's well-being, influenced by health-related stigma and a lack of reliable information, which can cause late diagnosis and early death. ChatGPT is likely to become a key source of health information, although quality concerns could also influence health-seeking behaviours.
Methods: This cross-sectional online survey compared ChatGPT's responses to five physicians specializing in mammography and five specializing in gynaecology. Twenty frequently asked questions about CC and BC were asked on 26th and 29th of April, 2023. A panel of seven experts assessed the accuracy, consistency, and relevance of ChatGPT's responses using a 7-point Likert scale. Responses were analyzed for readability, reliability, and efficiency. ChatGPT's responses were synthesized, and findings are presented as a radar chart.
Results: ChatGPT had an accuracy score of 7.0 (range: 6.6-7.0) for CC and BC questions, surpassing the highest-scoring physicians (P < 0.05). ChatGPT took an average of 13.6 s (range: 7.6-24.0) to answer each of the 20 questions presented. Readability was comparable to that of experts and physicians involved, but ChatGPT generated more extended responses compared to physicians. The consistency of repeated answers was 5.2 (range: 3.4-6.7). With different contexts combined, the overall ChatGPT relevance score was 6.5 (range: 4.8-7.0). Radar plot analysis indicated comparably good accuracy, efficiency, and to a certain extent, relevance. However, there were apparent inconsistencies, and the reliability and readability be considered inadequate.
Conclusions: ChatGPT shows promise as an initial source of information for CC and BC. ChatGPT is also highly functional and appears to be superior to physicians, and aligns with expert consensus, although there is room for improvement in readability, reliability, and consistency. Future efforts should focus on developing advanced ChatGPT models explicitly designed to improve medical practice and for those with concerns about symptoms.