AI聊天机器人作为STD信息的来源：可靠性和可读性研究。

IF 3.5 3区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Medical Systems Pub Date : 2025-04-03 DOI:10.1007/s10916-025-02178-z

Hüseyin Alperen Yıldız, Emrullah Söğütdelen

{"title":"AI聊天机器人作为STD信息的来源：可靠性和可读性研究。","authors":"Hüseyin Alperen Yıldız, Emrullah Söğütdelen","doi":"10.1007/s10916-025-02178-z","DOIUrl":null,"url":null,"abstract":"Background: Artificial intelligence (AI) chatbots are increasingly used for medical inquiries, including sensitive topics like sexually transmitted diseases (STDs). However, concerns remain regarding the reliability and readability of the information they provide. This study aimed to assess the reliability and readability of AI chatbots in providing information on STDs. The key objectives were to determine (1) the reliability of STD-related information provided by AI chatbots, and (2) whether the readability of this information meets the recommended standarts for patient education materials.Methods: Eleven relevant STD-related search queries were identified using Google Trends and entered into four AI chatbots: ChatGPT, Gemini, Perplexity, and Copilot. The reliability of the responses was evaluated using established tools, including DISCERN, EQIP, JAMA, and GQS. Readability was assessed using six widely recognized metrics, such as the Flesch-Kincaid Grade Level and the Gunning Fog Index. The performance of chatbots was statistically compared in terms of reliability and readability.Results: The analysis revealed significant differences in reliability across the AI chatbots. Perplexity and Copilot consistently outperformed ChatGPT and Gemini in DISCERN and EQIP scores, suggesting that these two chatbots provided more reliable information. However, results showed that none of the chatbots achieved the 6th-grade readability standard. All the chatbots generated information that was too complex for the general public, especially for individuals with lower health literacy levels.Conclusion: While Perplexity and Copilot showed better reliability in providing STD-related information, none of the chatbots met the recommended readability benchmarks. These findings highlight the need for future improvements in both the accuracy and accessibility of AI-generated health information, ensuring it can be easily understood by a broader audience.","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"49 1","pages":"43"},"PeriodicalIF":3.5000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11968469/pdf/","citationCount":"0","resultStr":"{\"title\":\"AI Chatbots as Sources of STD Information: A Study on Reliability and Readability.\",\"authors\":\"Hüseyin Alperen Yıldız, Emrullah Söğütdelen\",\"doi\":\"10.1007/s10916-025-02178-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Artificial intelligence (AI) chatbots are increasingly used for medical inquiries, including sensitive topics like sexually transmitted diseases (STDs). However, concerns remain regarding the reliability and readability of the information they provide. This study aimed to assess the reliability and readability of AI chatbots in providing information on STDs. The key objectives were to determine (1) the reliability of STD-related information provided by AI chatbots, and (2) whether the readability of this information meets the recommended standarts for patient education materials.Methods: Eleven relevant STD-related search queries were identified using Google Trends and entered into four AI chatbots: ChatGPT, Gemini, Perplexity, and Copilot. The reliability of the responses was evaluated using established tools, including DISCERN, EQIP, JAMA, and GQS. Readability was assessed using six widely recognized metrics, such as the Flesch-Kincaid Grade Level and the Gunning Fog Index. The performance of chatbots was statistically compared in terms of reliability and readability.Results: The analysis revealed significant differences in reliability across the AI chatbots. Perplexity and Copilot consistently outperformed ChatGPT and Gemini in DISCERN and EQIP scores, suggesting that these two chatbots provided more reliable information. However, results showed that none of the chatbots achieved the 6th-grade readability standard. All the chatbots generated information that was too complex for the general public, especially for individuals with lower health literacy levels.Conclusion: While Perplexity and Copilot showed better reliability in providing STD-related information, none of the chatbots met the recommended readability benchmarks. These findings highlight the need for future improvements in both the accuracy and accessibility of AI-generated health information, ensuring it can be easily understood by a broader audience.\",\"PeriodicalId\":16338,\"journal\":{\"name\":\"Journal of Medical Systems\",\"volume\":\"49 1\",\"pages\":\"43\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11968469/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Systems\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s10916-025-02178-z\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Systems","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10916-025-02178-z","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

摘要

背景：人工智能（AI）聊天机器人越来越多地用于医疗咨询，包括性传播疾病（std）等敏感话题。但是，它们所提供的资料的可靠性和可读性仍然令人关切。本研究旨在评估人工智能聊天机器人在提供性传播疾病信息方面的可靠性和可读性。主要目标是确定(1)人工智能聊天机器人提供的性病相关信息的可靠性，以及(2)这些信息的可读性是否符合患者教育材料的推荐标准。方法：使用谷歌Trends识别11个与性病相关的搜索查询，并输入4个AI聊天机器人：ChatGPT、Gemini、Perplexity和Copilot。使用已建立的工具（包括DISCERN、EQIP、JAMA和GQS）评估应答的可靠性。可读性的评估使用了六个广泛认可的指标，如flesch - kinkaid等级水平和射击雾指数。在可靠性和可读性方面对聊天机器人的性能进行了统计比较。结果：分析显示，不同的人工智能聊天机器人在可靠性方面存在显著差异。在DISCERN和EQIP得分上，Perplexity和Copilot的表现一直优于ChatGPT和Gemini，这表明这两个聊天机器人提供了更可靠的信息。然而，结果显示，没有一个聊天机器人达到6年级的可读性标准。所有聊天机器人生成的信息对于普通大众来说都太复杂了，尤其是对于健康知识水平较低的个人。结论：虽然Perplexity和Copilot在提供性病相关信息方面表现出更好的可靠性，但没有一个聊天机器人达到推荐的可读性基准。这些发现突出表明，未来需要改进人工智能生成的健康信息的准确性和可及性，确保更广泛的受众能够轻松理解这些信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AI Chatbots as Sources of STD Information: A Study on Reliability and Readability.

Background: Artificial intelligence (AI) chatbots are increasingly used for medical inquiries, including sensitive topics like sexually transmitted diseases (STDs). However, concerns remain regarding the reliability and readability of the information they provide. This study aimed to assess the reliability and readability of AI chatbots in providing information on STDs. The key objectives were to determine (1) the reliability of STD-related information provided by AI chatbots, and (2) whether the readability of this information meets the recommended standarts for patient education materials.

Methods: Eleven relevant STD-related search queries were identified using Google Trends and entered into four AI chatbots: ChatGPT, Gemini, Perplexity, and Copilot. The reliability of the responses was evaluated using established tools, including DISCERN, EQIP, JAMA, and GQS. Readability was assessed using six widely recognized metrics, such as the Flesch-Kincaid Grade Level and the Gunning Fog Index. The performance of chatbots was statistically compared in terms of reliability and readability.

Results: The analysis revealed significant differences in reliability across the AI chatbots. Perplexity and Copilot consistently outperformed ChatGPT and Gemini in DISCERN and EQIP scores, suggesting that these two chatbots provided more reliable information. However, results showed that none of the chatbots achieved the 6th-grade readability standard. All the chatbots generated information that was too complex for the general public, especially for individuals with lower health literacy levels.

Conclusion: While Perplexity and Copilot showed better reliability in providing STD-related information, none of the chatbots met the recommended readability benchmarks. These findings highlight the need for future improvements in both the accuracy and accessibility of AI-generated health information, ensuring it can be easily understood by a broader audience.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Medical Systems 医学-卫生保健

CiteScore

11.60

自引率

1.90%

发文量

审稿时长

4.8 months

期刊介绍： Journal of Medical Systems provides a forum for the presentation and discussion of the increasingly extensive applications of new systems techniques and methods in hospital clinic and physician''s office administration; pathology radiology and pharmaceutical delivery systems; medical records storage and retrieval; and ancillary patient-support systems. The journal publishes informative articles essays and studies across the entire scale of medical systems from large hospital programs to novel small-scale medical services. Education is an integral part of this amalgamation of sciences and selected articles are published in this area. Since existing medical systems are constantly being modified to fit particular circumstances and to solve specific problems the journal includes a special section devoted to status reports on current installations.