{"title":"当前有关泌尿科患者信息的聊天机器人有多有用?比较十大最受欢迎聊天机器人关于女性尿失禁的回答。","authors":"Arzu Malak, Mehmet Fatih Şahin","doi":"10.1007/s10916-024-02125-4","DOIUrl":null,"url":null,"abstract":"<p><p>This research evaluates the readability and quality of patient information material about female urinary incontinence (fUI) in ten popular artificial intelligence (AI) supported chatbots. We used the most recent versions of 10 widely-used chatbots, including OpenAI's GPT-4, Claude-3 Sonnet, Grok 1.5, Mistral Large 2, Google Palm 2, Meta's Llama 3, HuggingChat v0.8.4, Microsoft's Copilot, Gemini Advanced, and Perplexity. Prompts were created to generate texts about UI, stress type UI, urge type UI, and mix type UI. The modified Ensuring Quality Information for Patients (EQIP) technique and QUEST (Quality Evaluating Scoring Tool) were used to assess the quality, and the average of 8 well-known readability formulas, which is Average Reading Level Consensus (ARLC), were used to evaluate readability. When comparing the average scores, there were significant differences in the mean mQEIP and QUEST scores across ten chatbots (p = 0.049 and p = 0.018). Gemini received the greatest mean scores for mEQIP and QUEST, whereas Grok had the lowest values. The chatbots exhibited significant differences in mean ARLC, word count, and sentence count (p = 0.047, p = 0.001, and p = 0.001, respectively). For readability, Grok is the easiest to read, while Mistral is highly complex to understand. AI-supported chatbot technology needs to be improved in terms of readability and quality of patient information regarding female UI.</p>","PeriodicalId":16338,"journal":{"name":"Journal of Medical Systems","volume":"48 1","pages":"102"},"PeriodicalIF":3.5000,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"How Useful are Current Chatbots Regarding Urology Patient Information? Comparison of the Ten Most Popular Chatbots' Responses About Female Urinary Incontinence.\",\"authors\":\"Arzu Malak, Mehmet Fatih Şahin\",\"doi\":\"10.1007/s10916-024-02125-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This research evaluates the readability and quality of patient information material about female urinary incontinence (fUI) in ten popular artificial intelligence (AI) supported chatbots. We used the most recent versions of 10 widely-used chatbots, including OpenAI's GPT-4, Claude-3 Sonnet, Grok 1.5, Mistral Large 2, Google Palm 2, Meta's Llama 3, HuggingChat v0.8.4, Microsoft's Copilot, Gemini Advanced, and Perplexity. Prompts were created to generate texts about UI, stress type UI, urge type UI, and mix type UI. The modified Ensuring Quality Information for Patients (EQIP) technique and QUEST (Quality Evaluating Scoring Tool) were used to assess the quality, and the average of 8 well-known readability formulas, which is Average Reading Level Consensus (ARLC), were used to evaluate readability. When comparing the average scores, there were significant differences in the mean mQEIP and QUEST scores across ten chatbots (p = 0.049 and p = 0.018). Gemini received the greatest mean scores for mEQIP and QUEST, whereas Grok had the lowest values. The chatbots exhibited significant differences in mean ARLC, word count, and sentence count (p = 0.047, p = 0.001, and p = 0.001, respectively). For readability, Grok is the easiest to read, while Mistral is highly complex to understand. AI-supported chatbot technology needs to be improved in terms of readability and quality of patient information regarding female UI.</p>\",\"PeriodicalId\":16338,\"journal\":{\"name\":\"Journal of Medical Systems\",\"volume\":\"48 1\",\"pages\":\"102\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Systems\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s10916-024-02125-4\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"HEALTH CARE SCIENCES & SERVICES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Systems","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10916-024-02125-4","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
How Useful are Current Chatbots Regarding Urology Patient Information? Comparison of the Ten Most Popular Chatbots' Responses About Female Urinary Incontinence.
This research evaluates the readability and quality of patient information material about female urinary incontinence (fUI) in ten popular artificial intelligence (AI) supported chatbots. We used the most recent versions of 10 widely-used chatbots, including OpenAI's GPT-4, Claude-3 Sonnet, Grok 1.5, Mistral Large 2, Google Palm 2, Meta's Llama 3, HuggingChat v0.8.4, Microsoft's Copilot, Gemini Advanced, and Perplexity. Prompts were created to generate texts about UI, stress type UI, urge type UI, and mix type UI. The modified Ensuring Quality Information for Patients (EQIP) technique and QUEST (Quality Evaluating Scoring Tool) were used to assess the quality, and the average of 8 well-known readability formulas, which is Average Reading Level Consensus (ARLC), were used to evaluate readability. When comparing the average scores, there were significant differences in the mean mQEIP and QUEST scores across ten chatbots (p = 0.049 and p = 0.018). Gemini received the greatest mean scores for mEQIP and QUEST, whereas Grok had the lowest values. The chatbots exhibited significant differences in mean ARLC, word count, and sentence count (p = 0.047, p = 0.001, and p = 0.001, respectively). For readability, Grok is the easiest to read, while Mistral is highly complex to understand. AI-supported chatbot technology needs to be improved in terms of readability and quality of patient information regarding female UI.
期刊介绍:
Journal of Medical Systems provides a forum for the presentation and discussion of the increasingly extensive applications of new systems techniques and methods in hospital clinic and physician''s office administration; pathology radiology and pharmaceutical delivery systems; medical records storage and retrieval; and ancillary patient-support systems. The journal publishes informative articles essays and studies across the entire scale of medical systems from large hospital programs to novel small-scale medical services. Education is an integral part of this amalgamation of sciences and selected articles are published in this area. Since existing medical systems are constantly being modified to fit particular circumstances and to solve specific problems the journal includes a special section devoted to status reports on current installations.