Mehmet Fatih Şahin, Erdem Can Topkaç, Çağrı Doğan, Serkan Şeramet, Rıdvan Özcan, Murat Akgül, Cenk Murat Yazıcı
{"title":"还在只用聊天机器人?比较了五种不同人工智能聊天机器人对有关肾结石最常见问题的回答。","authors":"Mehmet Fatih Şahin, Erdem Can Topkaç, Çağrı Doğan, Serkan Şeramet, Rıdvan Özcan, Murat Akgül, Cenk Murat Yazıcı","doi":"10.1089/end.2024.0474","DOIUrl":null,"url":null,"abstract":"<p><p><b><i>Objective:</i></b> To evaluate and compare the quality and comprehensibility of answers produced by five distinct artificial intelligence (AI) chatbots-GPT-4, Claude, Mistral, Google PaLM, and Grok-in response to the most frequently searched questions about kidney stones (KS). <b><i>Materials and Methods:</i></b> Google Trends facilitated the identification of pertinent terms related to KS. Each AI chatbot was provided with a unique sequence of 25 commonly searched phrases as input. The responses were assessed using DISCERN, the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), the Flesch-Kincaid Grade Level (FKGL), and the Flesch-Kincaid Reading Ease (FKRE) criteria. <b><i>Results:</i></b> The three most frequently searched terms were \"stone in kidney,\" \"kidney stone pain,\" and \"kidney pain.\" Nepal, India, and Trinidad and Tobago were the countries that performed the most searches in KS. None of the AI chatbots attained the requisite level of comprehensibility. Grok demonstrated the highest FKRE (55.6 ± 7.1) and lowest FKGL (10.0 ± 1.1) ratings (<i>p</i> = 0.001), whereas Claude outperformed the other chatbots in its DISCERN scores (47.6 ± 1.2) (<i>p</i> = 0.001). PEMAT-P understandability was the lowest in GPT-4 (53.2 ± 2.0), and actionability was the highest in Claude (61.8 ± 3.5) (<i>p</i> = 0.001). <b><i>Conclusion:</i></b> GPT-4 had the most complex language structure of the five chatbots, making it the most difficult to read and comprehend, whereas Grok was the simplest. Claude had the best KS text quality. Chatbot technology can improve healthcare material and make it easier to grasp.</p>","PeriodicalId":15723,"journal":{"name":"Journal of endourology","volume":" ","pages":"1172-1177"},"PeriodicalIF":2.9000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones.\",\"authors\":\"Mehmet Fatih Şahin, Erdem Can Topkaç, Çağrı Doğan, Serkan Şeramet, Rıdvan Özcan, Murat Akgül, Cenk Murat Yazıcı\",\"doi\":\"10.1089/end.2024.0474\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><b><i>Objective:</i></b> To evaluate and compare the quality and comprehensibility of answers produced by five distinct artificial intelligence (AI) chatbots-GPT-4, Claude, Mistral, Google PaLM, and Grok-in response to the most frequently searched questions about kidney stones (KS). <b><i>Materials and Methods:</i></b> Google Trends facilitated the identification of pertinent terms related to KS. Each AI chatbot was provided with a unique sequence of 25 commonly searched phrases as input. The responses were assessed using DISCERN, the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), the Flesch-Kincaid Grade Level (FKGL), and the Flesch-Kincaid Reading Ease (FKRE) criteria. <b><i>Results:</i></b> The three most frequently searched terms were \\\"stone in kidney,\\\" \\\"kidney stone pain,\\\" and \\\"kidney pain.\\\" Nepal, India, and Trinidad and Tobago were the countries that performed the most searches in KS. None of the AI chatbots attained the requisite level of comprehensibility. Grok demonstrated the highest FKRE (55.6 ± 7.1) and lowest FKGL (10.0 ± 1.1) ratings (<i>p</i> = 0.001), whereas Claude outperformed the other chatbots in its DISCERN scores (47.6 ± 1.2) (<i>p</i> = 0.001). PEMAT-P understandability was the lowest in GPT-4 (53.2 ± 2.0), and actionability was the highest in Claude (61.8 ± 3.5) (<i>p</i> = 0.001). <b><i>Conclusion:</i></b> GPT-4 had the most complex language structure of the five chatbots, making it the most difficult to read and comprehend, whereas Grok was the simplest. Claude had the best KS text quality. Chatbot technology can improve healthcare material and make it easier to grasp.</p>\",\"PeriodicalId\":15723,\"journal\":{\"name\":\"Journal of endourology\",\"volume\":\" \",\"pages\":\"1172-1177\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of endourology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1089/end.2024.0474\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/6 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"UROLOGY & NEPHROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of endourology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1089/end.2024.0474","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/6 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"UROLOGY & NEPHROLOGY","Score":null,"Total":0}
Still Using Only ChatGPT? The Comparison of Five Different Artificial Intelligence Chatbots' Answers to the Most Common Questions About Kidney Stones.
Objective: To evaluate and compare the quality and comprehensibility of answers produced by five distinct artificial intelligence (AI) chatbots-GPT-4, Claude, Mistral, Google PaLM, and Grok-in response to the most frequently searched questions about kidney stones (KS). Materials and Methods: Google Trends facilitated the identification of pertinent terms related to KS. Each AI chatbot was provided with a unique sequence of 25 commonly searched phrases as input. The responses were assessed using DISCERN, the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P), the Flesch-Kincaid Grade Level (FKGL), and the Flesch-Kincaid Reading Ease (FKRE) criteria. Results: The three most frequently searched terms were "stone in kidney," "kidney stone pain," and "kidney pain." Nepal, India, and Trinidad and Tobago were the countries that performed the most searches in KS. None of the AI chatbots attained the requisite level of comprehensibility. Grok demonstrated the highest FKRE (55.6 ± 7.1) and lowest FKGL (10.0 ± 1.1) ratings (p = 0.001), whereas Claude outperformed the other chatbots in its DISCERN scores (47.6 ± 1.2) (p = 0.001). PEMAT-P understandability was the lowest in GPT-4 (53.2 ± 2.0), and actionability was the highest in Claude (61.8 ± 3.5) (p = 0.001). Conclusion: GPT-4 had the most complex language structure of the five chatbots, making it the most difficult to read and comprehend, whereas Grok was the simplest. Claude had the best KS text quality. Chatbot technology can improve healthcare material and make it easier to grasp.
期刊介绍:
Journal of Endourology, JE Case Reports, and Videourology are the leading peer-reviewed journal, case reports publication, and innovative videojournal companion covering all aspects of minimally invasive urology research, applications, and clinical outcomes.
The leading journal of minimally invasive urology for over 30 years, Journal of Endourology is the essential publication for practicing surgeons who want to keep up with the latest surgical technologies in endoscopic, laparoscopic, robotic, and image-guided procedures as they apply to benign and malignant diseases of the genitourinary tract. This flagship journal includes the companion videojournal Videourology™ with every subscription. While Journal of Endourology remains focused on publishing rigorously peer reviewed articles, Videourology accepts original videos containing material that has not been reported elsewhere, except in the form of an abstract or a conference presentation.
Journal of Endourology coverage includes:
The latest laparoscopic, robotic, endoscopic, and image-guided techniques for treating both benign and malignant conditions
Pioneering research articles
Controversial cases in endourology
Techniques in endourology with accompanying videos
Reviews and epochs in endourology
Endourology survey section of endourology relevant manuscripts published in other journals.