标准版和高级版ChatGPT、谷歌Gemini和Microsoft Copilot的前列腺癌筛查信息质量比较：一项横断面研究

IF 2.5 4区医学 Q2 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

American Journal of Health Promotion Pub Date : 2025-06-01 Epub Date: 2025-01-24 DOI:10.1177/08901171251316371

Otis L Owens, Michael Leonard

{"title":"标准版和高级版ChatGPT、谷歌Gemini和Microsoft Copilot的前列腺癌筛查信息质量比较：一项横断面研究","authors":"Otis L Owens, Michael Leonard","doi":"10.1177/08901171251316371","DOIUrl":null,"url":null,"abstract":"PurposeArtificially Intelligent (AI) chatbots have the potential to produce information to support shared prostate cancer (PrCA) decision-making. Therefore, our purpose was to evaluate and compare the accuracy, completeness, readability, and credibility of responses from standard and advanced versions of popular chatbots: ChatGPT-3.5, ChatGPT-4.0, Microsoft Copilot, Microsoft Copilot Pro, Google Gemini, and Google Gemini Advanced. We also investigated whether prompting chatbots for low-literacy PrCA information would improve the readability of responses. Lastly, we determined if the responses were appropriate for African-American men, who have the worst PrCA outcomes.ApproachThe study used a cross-sectional approach to examine the quality of responses solicited from chatbots.ParticipantsThe study did not include human subjects.MethodEleven frequently asked PrCA questions, based on resources produced by the Centers for Disease Control and Prevention (CDC) and the American Cancer Society (ACS), were posed to each chatbot twice (once for low literacy populations). A coding/rating form containing questions with key points/answers from the ACS or CDC to facilitate the rating process. Accuracy and completeness were rated dichotomously (i.e., yes/no). Credibility was determined by whether a trustworthy medical or health-related organization was cited. Readability was determined using a Flesch-Kincaid readability score calculator that enabled chatbot responses to be entered individually. Average accuracy, completeness, credibility, and readability percentages or scores were calculated using Excel.ResultsAll chatbots were accurate, but the completeness, readability, and credibility of responses varied. Soliciting low-literacy responses significantly improved readability, but sometimes at the detriment of completeness. All chatbots recognized the higher PrCA risk in African-American men and tailored screening recommendations. Microsoft Copilot Pro had the best overall performance on standard screening questions. Microsoft Copilot outperformed other chatbots on responses for low literacy populations.ConclusionsAI chatbots are useful tools for learning about PrCA screening but should be combined with healthcare provider advice.","PeriodicalId":7481,"journal":{"name":"American Journal of Health Promotion","volume":" ","pages":"766-776"},"PeriodicalIF":2.5000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparison of Prostate Cancer Screening Information Quality on Standard and Advanced Versions of ChatGPT, Google Gemini, and Microsoft Copilot: A Cross-Sectional Study.\",\"authors\":\"Otis L Owens, Michael Leonard\",\"doi\":\"10.1177/08901171251316371\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PurposeArtificially Intelligent (AI) chatbots have the potential to produce information to support shared prostate cancer (PrCA) decision-making. Therefore, our purpose was to evaluate and compare the accuracy, completeness, readability, and credibility of responses from standard and advanced versions of popular chatbots: ChatGPT-3.5, ChatGPT-4.0, Microsoft Copilot, Microsoft Copilot Pro, Google Gemini, and Google Gemini Advanced. We also investigated whether prompting chatbots for low-literacy PrCA information would improve the readability of responses. Lastly, we determined if the responses were appropriate for African-American men, who have the worst PrCA outcomes.ApproachThe study used a cross-sectional approach to examine the quality of responses solicited from chatbots.ParticipantsThe study did not include human subjects.MethodEleven frequently asked PrCA questions, based on resources produced by the Centers for Disease Control and Prevention (CDC) and the American Cancer Society (ACS), were posed to each chatbot twice (once for low literacy populations). A coding/rating form containing questions with key points/answers from the ACS or CDC to facilitate the rating process. Accuracy and completeness were rated dichotomously (i.e., yes/no). Credibility was determined by whether a trustworthy medical or health-related organization was cited. Readability was determined using a Flesch-Kincaid readability score calculator that enabled chatbot responses to be entered individually. Average accuracy, completeness, credibility, and readability percentages or scores were calculated using Excel.ResultsAll chatbots were accurate, but the completeness, readability, and credibility of responses varied. Soliciting low-literacy responses significantly improved readability, but sometimes at the detriment of completeness. All chatbots recognized the higher PrCA risk in African-American men and tailored screening recommendations. Microsoft Copilot Pro had the best overall performance on standard screening questions. Microsoft Copilot outperformed other chatbots on responses for low literacy populations.ConclusionsAI chatbots are useful tools for learning about PrCA screening but should be combined with healthcare provider advice.\",\"PeriodicalId\":7481,\"journal\":{\"name\":\"American Journal of Health Promotion\",\"volume\":\" \",\"pages\":\"766-776\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Health Promotion\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1177/08901171251316371\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/24 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Health Promotion","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/08901171251316371","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/24 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}

引用次数: 0

摘要

目的：人工智能（AI）聊天机器人具有产生信息以支持前列腺癌（PrCA）共享决策的潜力。因此，我们的目的是评估和比较来自标准和高级版本的流行聊天机器人的回答的准确性、完整性、可读性和可信度：ChatGPT-3.5、ChatGPT-4.0、Microsoft Copilot、Microsoft Copilot Pro、谷歌Gemini和谷歌Gemini advanced。我们还研究了提示聊天机器人获取低读写能力的PrCA信息是否会提高回复的可读性。最后，我们确定这些反应是否适用于非裔美国男性，他们的PrCA结果最差。方法：该研究采用了横断面方法来检查从聊天机器人请求的回复的质量。参与者：该研究不包括人类受试者。方法：根据疾病控制和预防中心（CDC）和美国癌症协会（ACS）提供的资源，向每个聊天机器人提出11个常见的PrCA问题两次（一次针对低文化人群）。一份编码/评级表格，包含美国癌症学会或疾病预防控制中心提供的问题和要点/答案，以方便评级过程。准确性和完整性被分为两类（即是/否）。可信度取决于是否引用了值得信赖的医疗或健康相关组织。可读性是使用Flesch-Kincaid可读性评分计算器确定的，该计算器允许单独输入聊天机器人的回复。使用Excel计算平均准确性、完整性、可信度和可读性百分比或分数。结果：所有聊天机器人都是准确的，但回答的完整性、可读性和可信度各不相同。征求低读写能力的回复可以显著提高可读性，但有时会损害完整性。所有聊天机器人都认识到非裔美国男性的PrCA风险较高，并提供量身定制的筛查建议。微软Copilot Pro在标准筛选问题上的总体表现最好。在对低文化水平人群的回答上，微软副驾驶的表现优于其他聊天机器人。结论：人工智能聊天机器人是了解PrCA筛查的有用工具，但应结合医疗保健提供者的建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Comparison of Prostate Cancer Screening Information Quality on Standard and Advanced Versions of ChatGPT, Google Gemini, and Microsoft Copilot: A Cross-Sectional Study.

PurposeArtificially Intelligent (AI) chatbots have the potential to produce information to support shared prostate cancer (PrCA) decision-making. Therefore, our purpose was to evaluate and compare the accuracy, completeness, readability, and credibility of responses from standard and advanced versions of popular chatbots: ChatGPT-3.5, ChatGPT-4.0, Microsoft Copilot, Microsoft Copilot Pro, Google Gemini, and Google Gemini Advanced. We also investigated whether prompting chatbots for low-literacy PrCA information would improve the readability of responses. Lastly, we determined if the responses were appropriate for African-American men, who have the worst PrCA outcomes.ApproachThe study used a cross-sectional approach to examine the quality of responses solicited from chatbots.ParticipantsThe study did not include human subjects.MethodEleven frequently asked PrCA questions, based on resources produced by the Centers for Disease Control and Prevention (CDC) and the American Cancer Society (ACS), were posed to each chatbot twice (once for low literacy populations). A coding/rating form containing questions with key points/answers from the ACS or CDC to facilitate the rating process. Accuracy and completeness were rated dichotomously (i.e., yes/no). Credibility was determined by whether a trustworthy medical or health-related organization was cited. Readability was determined using a Flesch-Kincaid readability score calculator that enabled chatbot responses to be entered individually. Average accuracy, completeness, credibility, and readability percentages or scores were calculated using Excel.ResultsAll chatbots were accurate, but the completeness, readability, and credibility of responses varied. Soliciting low-literacy responses significantly improved readability, but sometimes at the detriment of completeness. All chatbots recognized the higher PrCA risk in African-American men and tailored screening recommendations. Microsoft Copilot Pro had the best overall performance on standard screening questions. Microsoft Copilot outperformed other chatbots on responses for low literacy populations.ConclusionsAI chatbots are useful tools for learning about PrCA screening but should be combined with healthcare provider advice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

American Journal of Health Promotion PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH-

CiteScore

4.40

自引率

3.70%

发文量

184

期刊介绍： The editorial goal of the American Journal of Health Promotion is to provide a forum for exchange among the many disciplines involved in health promotion and an interface between researchers and practitioners.