Leveraging artificial intelligence chatbots for anemia prevention: A comparative study of ChatGPT-3.5, copilot, and Gemini outputs against Google Search results

PEC innovation Pub Date : 2025-04-01 DOI:10.1016/j.pecinn.2025.100390

Shinya Ito , Emi Furukawa , Tsuyoshi Okuhara , Hiroko Okada , Takahiro Kiuchi

{"title":"Leveraging artificial intelligence chatbots for anemia prevention: A comparative study of ChatGPT-3.5, copilot, and Gemini outputs against Google Search results","authors":"Shinya Ito , Emi Furukawa , Tsuyoshi Okuhara , Hiroko Okada , Takahiro Kiuchi","doi":"10.1016/j.pecinn.2025.100390","DOIUrl":null,"url":null,"abstract":"<div><h3>Aim</h3><div>This study evaluated the understandability, actionability, and readability of text on anemia generated by artificial intelligence (AI) chatbots.</div></div><div><h3>Methods</h3><div>This cross-sectional study compared texts generated by ChatGPT-3.5, Microsoft Copilot, and Google Gemini at three levels: “normal,” “6th grade,” and “PEMAT-P version.” Additionally, texts retrieved from the top eight Google Search results for relevant keywords were included for comparison. All texts were written in Japanese. The Japanese version of the PEMAT-P was used to assess understandability and actionability, while jReadability was used for readability. A systematic comparison was conducted to identify the strengths and weaknesses of each source.</div></div><div><h3>Results</h3><div>Texts generated by Gemini at the 6th-grade level (<em>n</em> = 26, 86.7 %) and PEMAT-P version (<em>n</em> = 27, 90.0 %), as well as ChatGPT-3.5 at the normal level (<em>n</em> = 21, 80.8 %), achieved significantly higher scores (≥70 %) for understandability and actionability compared to Google Search results (<em>n</em> = 17, 25.4 %, <em>p</em> < 0.001). For readability, Copilot and Gemini texts demonstrated significantly higher percentages of “very readable” to “somewhat difficult” levels than texts retrieved from Google Search (<em>p</em> = 0.000–0.007).</div></div><div><h3>Innovation</h3><div>This study is the first to objectively and quantitatively evaluate the understandability and actionability of educational materials on anemia prevention. By utilizing PEMAT-P and jReadability, the study demonstrated the superiority of Gemini in terms of understandability and readability through measurable data. This innovative approach highlights the potential of AI chatbots as a novel method for providing public health information and addressing health disparities.</div></div><div><h3>Conclusion</h3><div>AI-generated texts on anemia were found to be more readable and easier to understand than traditional web-based texts, with Gemini demonstrating the highest level of understandability. Moving forward, improvements in prompts will be necessary to enhance the integration of visual elements that encourage actionable responses in AI chatbots.</div></div>","PeriodicalId":74407,"journal":{"name":"PEC innovation","volume":"6 ","pages":"Article 100390"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PEC innovation","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772628225000196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Aim

This study evaluated the understandability, actionability, and readability of text on anemia generated by artificial intelligence (AI) chatbots.

Methods

This cross-sectional study compared texts generated by ChatGPT-3.5, Microsoft Copilot, and Google Gemini at three levels: “normal,” “6th grade,” and “PEMAT-P version.” Additionally, texts retrieved from the top eight Google Search results for relevant keywords were included for comparison. All texts were written in Japanese. The Japanese version of the PEMAT-P was used to assess understandability and actionability, while jReadability was used for readability. A systematic comparison was conducted to identify the strengths and weaknesses of each source.

Results

Texts generated by Gemini at the 6th-grade level (n = 26, 86.7 %) and PEMAT-P version (n = 27, 90.0 %), as well as ChatGPT-3.5 at the normal level (n = 21, 80.8 %), achieved significantly higher scores (≥70 %) for understandability and actionability compared to Google Search results (n = 17, 25.4 %, p < 0.001). For readability, Copilot and Gemini texts demonstrated significantly higher percentages of “very readable” to “somewhat difficult” levels than texts retrieved from Google Search (p = 0.000–0.007).

Innovation

This study is the first to objectively and quantitatively evaluate the understandability and actionability of educational materials on anemia prevention. By utilizing PEMAT-P and jReadability, the study demonstrated the superiority of Gemini in terms of understandability and readability through measurable data. This innovative approach highlights the potential of AI chatbots as a novel method for providing public health information and addressing health disparities.

Conclusion

AI-generated texts on anemia were found to be more readable and easier to understand than traditional web-based texts, with Gemini demonstrating the highest level of understandability. Moving forward, improvements in prompts will be necessary to enhance the integration of visual elements that encourage actionable responses in AI chatbots.

Abstract Image

查看原文本刊更多论文

利用人工智能聊天机器人预防贫血：ChatGPT-3.5、copilot和Gemini输出与谷歌搜索结果的比较研究

目的评价人工智能（AI）聊天机器人生成的贫血文本的可理解性、可操作性和可读性。方法：本横断面研究比较了ChatGPT-3.5、Microsoft Copilot和谷歌Gemini在“正常”、“六年级”和“PEMAT-P版本”三个级别上生成的文本。此外，从谷歌搜索结果的前8个相关关键词中检索到的文本也被包括进来进行比较。所有的课文都是用日文写的。PEMAT-P的日文版本用于评估可理解性和可操作性，而jReadability用于可读性。进行了系统的比较，以确定每个来源的优点和缺点。结果Gemini在六年级水平（n = 26, 86.7%）和PEMAT-P版本（n = 27, 90.0%）以及ChatGPT-3.5在正常水平（n = 21, 80.8%）生成的文本在可理解性和可操作性方面的得分（≥70%）明显高于谷歌搜索结果(n = 17, 25.4%, p <；0.001)。在可读性方面，与从谷歌搜索中检索到的文本相比，Copilot和Gemini文本显示出明显更高的“非常可读”到“有些困难”级别的百分比（p = 0.000-0.007）。本研究首次客观定量地评价了预防贫血教材的可理解性和可操作性。本研究利用PEMAT-P和jReadability，通过可测量的数据证明了Gemini在可理解性和可读性方面的优势。这一创新方法凸显了人工智能聊天机器人作为提供公共卫生信息和解决健康差距的新方法的潜力。结论人工智能生成的关于贫血的文本比传统的基于网络的文本更具可读性，更容易理解，其中Gemini的可理解性最高。展望未来，提示的改进将是必要的，以增强视觉元素的集成，从而鼓励人工智能聊天机器人做出可操作的响应。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊