Chatbots' performance in premature ejaculation questions: a comparative analysis of reliability, readability, and understandability.

IF 2.5 3区 医学 Q2 UROLOGY & NEPHROLOGY
S Gonultas, S Kardas, M Gelmis, A H Kinik, M Ozalevli, M G Kose, S Sulejman, S Yentur, B Arslan
{"title":"Chatbots' performance in premature ejaculation questions: a comparative analysis of reliability, readability, and understandability.","authors":"S Gonultas, S Kardas, M Gelmis, A H Kinik, M Ozalevli, M G Kose, S Sulejman, S Yentur, B Arslan","doi":"10.1038/s41443-025-01179-3","DOIUrl":null,"url":null,"abstract":"<p><p>This study aimed to evaluate the reliability, readability, and understandability of chatbot responses to frequently asked questions about premature ejaculation, and to assess the contributions, potential risks, and limitations of artificial intelligence. Fifteen questions were selected using data from Google Trends and posed to the chatbots Copilot, Gemini, ChatGPT4o, ChatGPT4oPlus, and DeepSeek-R1. Reliability was evaluated using the Global Quality Scale(GQS) by two experts, readability was assessed with the Flesch Kincaid Reading Ease(FKRE), Flesch Kincaid Grade Level(FKGL), Gunning Fog Index(GFI), and Simple Measure of Gobbledygook(SMOG), and understandability was evaluated using the Patient Educational Materials Assessment Tool for Printable Materials(PEMAT-P). Additionally, the consistency of source citations was examined. The GQS were as follows: Copilot: 3.96 ± 0.66, Gemini: 3.66 ± 0.78, ChatGPT4o: 4.83 ± 0.23, ChatGPT4oPlus: 4.83 ± 0.29, DeepSeek-R1:4.86 ± 0.22 (p < 0.001). The PEMAT-P were as follows: Copilot: 0.70 ± 0.05, Gemini: 0.72 ± 0.04, ChatGPT4o: 0.83 ± 0.03, ChatGPT4oPlus: 0.77 ± 0.06, DeepSeek-R1:0.79 ± 0.06 (p < 0.001). While ChatGPT4oPlus and DeepSeek-R1 scored higher for reliability and understandability, all chatbots performed at an acceptable level (≥70%). However, readability scores were above the recommended level for the target audience. Instances of low reliability or unverified sources were noted, with no significant differences between the chatbots. Chatbots provide highly reliable and informative responses regarding premature ejaculation; however, it is evident that there are significant limitations that require improvement, particularly concerning readability and the reliability of sources.</p>","PeriodicalId":14068,"journal":{"name":"International Journal of Impotence Research","volume":" ","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Impotence Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41443-025-01179-3","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

This study aimed to evaluate the reliability, readability, and understandability of chatbot responses to frequently asked questions about premature ejaculation, and to assess the contributions, potential risks, and limitations of artificial intelligence. Fifteen questions were selected using data from Google Trends and posed to the chatbots Copilot, Gemini, ChatGPT4o, ChatGPT4oPlus, and DeepSeek-R1. Reliability was evaluated using the Global Quality Scale(GQS) by two experts, readability was assessed with the Flesch Kincaid Reading Ease(FKRE), Flesch Kincaid Grade Level(FKGL), Gunning Fog Index(GFI), and Simple Measure of Gobbledygook(SMOG), and understandability was evaluated using the Patient Educational Materials Assessment Tool for Printable Materials(PEMAT-P). Additionally, the consistency of source citations was examined. The GQS were as follows: Copilot: 3.96 ± 0.66, Gemini: 3.66 ± 0.78, ChatGPT4o: 4.83 ± 0.23, ChatGPT4oPlus: 4.83 ± 0.29, DeepSeek-R1:4.86 ± 0.22 (p < 0.001). The PEMAT-P were as follows: Copilot: 0.70 ± 0.05, Gemini: 0.72 ± 0.04, ChatGPT4o: 0.83 ± 0.03, ChatGPT4oPlus: 0.77 ± 0.06, DeepSeek-R1:0.79 ± 0.06 (p < 0.001). While ChatGPT4oPlus and DeepSeek-R1 scored higher for reliability and understandability, all chatbots performed at an acceptable level (≥70%). However, readability scores were above the recommended level for the target audience. Instances of low reliability or unverified sources were noted, with no significant differences between the chatbots. Chatbots provide highly reliable and informative responses regarding premature ejaculation; however, it is evident that there are significant limitations that require improvement, particularly concerning readability and the reliability of sources.

聊天机器人在早泄问题中的表现:可靠性、可读性和可理解性的比较分析。
本研究旨在评估聊天机器人对早泄常见问题的回答的可靠性、可读性和可理解性,并评估人工智能的贡献、潜在风险和局限性。从谷歌Trends的数据中选择了15个问题,并向聊天机器人Copilot、Gemini、chatgpt40、chatgpt40plus和DeepSeek-R1提出了问题。可靠性由两位专家采用全球质量量表(GQS)进行评估,可读性采用Flesch Kincaid阅读简易量表(FKRE)、Flesch Kincaid等级水平量表(FKGL)、Gunning Fog指数(GFI)和简单测量的Gobbledygook量表(SMOG)进行评估,可理解性采用可打印材料患者教育材料评估工具(PEMAT-P)进行评估。此外,还检查了源引文的一致性。GQS分别为:副驾驶:3.96±0.66,双子星:3.66±0.78,chatgpt40: 4.83±0.23,chatgpt40 +: 4.83±0.29,DeepSeek-R1:4.86±0.22
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Impotence Research
International Journal of Impotence Research 医学-泌尿学与肾脏学
CiteScore
4.90
自引率
19.20%
发文量
140
审稿时长
>12 weeks
期刊介绍: International Journal of Impotence Research: The Journal of Sexual Medicine addresses sexual medicine for both genders as an interdisciplinary field. This includes basic science researchers, urologists, endocrinologists, cardiologists, family practitioners, gynecologists, internists, neurologists, psychiatrists, psychologists, radiologists and other health care clinicians.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信