还在只用聊天机器人?比较了五种不同人工智能聊天机器人对有关肾结石最常见问题的回答。

IF 2.9 2区 医学 Q1 UROLOGY & NEPHROLOGY
Mehmet Fatih Şahin, Erdem Can Topkaç, Çağrı Doğan, Serkan Şeramet, Rıdvan Özcan, Murat Akgül, Cenk Murat Yazıcı
{"title":"还在只用聊天机器人?比较了五种不同人工智能聊天机器人对有关肾结石最常见问题的回答。","authors":"Mehmet Fatih Şahin, Erdem Can Topkaç, Çağrı Doğan, Serkan Şeramet, Rıdvan Özcan, Murat Akgül, Cenk Murat Yazıcı","doi":"10.1089/end.2024.0474","DOIUrl":null,"url":null,"abstract":"<p><p><b><i>Objective:</i></b> To evaluate and compare the quality and comprehensibility of answers produced by five distinct artificial intelligence (AI) chatbots-GPT-4, Claude, Mistral, Google PaLM, and Grok-in response to the most frequently searched questions about kidney stones (KS). <b><i>Materials and Methods:</i></b> Google Trends facilitated the identification of pertinent terms related to KS. Each AI chatbot was provided with a unique sequence of 25 commonly searched phrases as input. The responses were assessed using DISCERN, the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), the Flesch-Kincaid Grade Level (FKGL), and the Flesch-Kincaid Reading Ease (FKRE) criteria. <b><i>Results:</i></b> The three most frequently searched terms were \"stone in kidney,\" \"kidney stone pain,\" and \"kidney pain.\" Nepal, India, and Trinidad and Tobago were the countries that performed the most searches in KS. None of the AI chatbots attained the requisite level of comprehensibility. Grok demonstrated the highest FKRE (55.6 ± 7.1) and lowest FKGL (10.0 ± 1.1) ratings (<i>p</i> = 0.001), whereas Claude outperformed the other chatbots in its DISCERN scores (47.6 ± 1.2) (<i>p</i> = 0.001). PEMAT-P understandability was the lowest in GPT-4 (53.2 ± 2.0), and actionability was the highest in Claude (61.8 ± 3.5) (<i>p</i> = 0.001). <b><i>Conclusion:</i></b> GPT-4 had the most complex language structure of the five chatbots, making it the most difficult to read and comprehend, whereas Grok was the simplest. Claude had the best KS text quality. Chatbot technology can improve healthcare material and make it easier to grasp.</p>","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones.\",\"authors\":\"Mehmet Fatih Şahin, Erdem Can Topkaç, Çağrı Doğan, Serkan Şeramet, Rıdvan Özcan, Murat Akgül, Cenk Murat Yazıcı\",\"doi\":\"10.1089/end.2024.0474\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b><i>Objective:</i></b> To evaluate and compare the quality and comprehensibility of answers produced by five distinct artificial intelligence (AI) chatbots-GPT-4, Claude, Mistral, Google PaLM, and Grok-in response to the most frequently searched questions about kidney stones (KS). <b><i>Materials and Methods:</i></b> Google Trends facilitated the identification of pertinent terms related to KS. Each AI chatbot was provided with a unique sequence of 25 commonly searched phrases as input. The responses were assessed using DISCERN, the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), the Flesch-Kincaid Grade Level (FKGL), and the Flesch-Kincaid Reading Ease (FKRE) criteria. <b><i>Results:</i></b> The three most frequently searched terms were \\\"stone in kidney,\\\" \\\"kidney stone pain,\\\" and \\\"kidney pain.\\\" Nepal, India, and Trinidad and Tobago were the countries that performed the most searches in KS. None of the AI chatbots attained the requisite level of comprehensibility. Grok demonstrated the highest FKRE (55.6 ± 7.1) and lowest FKGL (10.0 ± 1.1) ratings (<i>p</i> = 0.001), whereas Claude outperformed the other chatbots in its DISCERN scores (47.6 ± 1.2) (<i>p</i> = 0.001). PEMAT-P understandability was the lowest in GPT-4 (53.2 ± 2.0), and actionability was the highest in Claude (61.8 ± 3.5) (<i>p</i> = 0.001). <b><i>Conclusion:</i></b> GPT-4 had the most complex language structure of the five chatbots, making it the most difficult to read and comprehend, whereas Grok was the simplest. Claude had the best KS text quality. Chatbot technology can improve healthcare material and make it easier to grasp.</p>\",\"PeriodicalId\":15723,\"journal\":{\"name\":\"Journal of endourology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of endourology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1089/end.2024.0474\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"UROLOGY & NEPHROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of endourology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1089/end.2024.0474","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的评估并比较五个不同的人工智能(AI)聊天机器人--GPT-4、Claude、Mistral、Google PaLM 和 Grok 在回答有关肾结石(KS)的最常搜索问题时所提供答案的质量和可理解性:谷歌趋势有助于识别与肾结石相关的术语。每个人工智能聊天机器人都获得了 25 个常用搜索短语的独特序列作为输入。使用 DISCERN、可打印材料患者教育材料评估工具(PEMAT-P)、Flesch-Kincaid 分级(FKGL)和 Flesch-Kincaid 阅读轻松度(FKRE)标准对回复进行评估:搜索次数最多的三个词是 "肾结石"、"肾结石痛 "和 "肾痛"。尼泊尔、印度和特立尼达和多巴哥是搜索 KS 最多的国家。没有一个人工智能聊天机器人达到了所需的可理解水平。Grok 的 FKRE 和 FKGL 评分最高(p=0.001),而 Claude 的 DISCERN 评分超过了其他聊天机器人(p=0.001)。GPT-4 的 PEMAT-P 可理解性最低,而 Claude 的可操作性最高(p=0.001):结论:在五个聊天机器人中,GPT-4 的语言结构最复杂,因此最难阅读和理解,而 Grok 则最简单。克劳德的 KS 文本质量最好。聊天机器人技术可以改进医疗保健材料,使其更易于掌握。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones.

Objective: To evaluate and compare the quality and comprehensibility of answers produced by five distinct artificial intelligence (AI) chatbots-GPT-4, Claude, Mistral, Google PaLM, and Grok-in response to the most frequently searched questions about kidney stones (KS). Materials and Methods: Google Trends facilitated the identification of pertinent terms related to KS. Each AI chatbot was provided with a unique sequence of 25 commonly searched phrases as input. The responses were assessed using DISCERN, the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), the Flesch-Kincaid Grade Level (FKGL), and the Flesch-Kincaid Reading Ease (FKRE) criteria. Results: The three most frequently searched terms were "stone in kidney," "kidney stone pain," and "kidney pain." Nepal, India, and Trinidad and Tobago were the countries that performed the most searches in KS. None of the AI chatbots attained the requisite level of comprehensibility. Grok demonstrated the highest FKRE (55.6 ± 7.1) and lowest FKGL (10.0 ± 1.1) ratings (p = 0.001), whereas Claude outperformed the other chatbots in its DISCERN scores (47.6 ± 1.2) (p = 0.001). PEMAT-P understandability was the lowest in GPT-4 (53.2 ± 2.0), and actionability was the highest in Claude (61.8 ± 3.5) (p = 0.001). Conclusion: GPT-4 had the most complex language structure of the five chatbots, making it the most difficult to read and comprehend, whereas Grok was the simplest. Claude had the best KS text quality. Chatbot technology can improve healthcare material and make it easier to grasp.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of endourology
Journal of endourology 医学-泌尿学与肾脏学
CiteScore
5.50
自引率
14.80%
发文量
254
审稿时长
1 months
期刊介绍: Journal of Endourology, JE Case Reports, and Videourology are the leading peer-reviewed journal, case reports publication, and innovative videojournal companion covering all aspects of minimally invasive urology research, applications, and clinical outcomes. The leading journal of minimally invasive urology for over 30 years, Journal of Endourology is the essential publication for practicing surgeons who want to keep up with the latest surgical technologies in endoscopic, laparoscopic, robotic, and image-guided procedures as they apply to benign and malignant diseases of the genitourinary tract. This flagship journal includes the companion videojournal Videourology™ with every subscription. While Journal of Endourology remains focused on publishing rigorously peer reviewed articles, Videourology accepts original videos containing material that has not been reported elsewhere, except in the form of an abstract or a conference presentation. Journal of Endourology coverage includes: The latest laparoscopic, robotic, endoscopic, and image-guided techniques for treating both benign and malignant conditions Pioneering research articles Controversial cases in endourology Techniques in endourology with accompanying videos Reviews and epochs in endourology Endourology survey section of endourology relevant manuscripts published in other journals.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信