评估人工智能聊天机器人对肾移植后询问的反应。

IF 0.8 4区医学 Q4 IMMUNOLOGY

Transplantation proceedings Pub Date : 2025-01-14 DOI:10.1016/j.transproceed.2024.12.028

Yihua Zhan , Xutao Chen , Feihong Ye , Zhikai Wu , Muhammad Usman , Zhihan Yuan , Han Wu , Jian Huang , Hao Yu

{"title":"评估人工智能聊天机器人对肾移植后询问的反应。","authors":"Yihua Zhan , Xutao Chen , Feihong Ye , Zhikai Wu , Muhammad Usman , Zhihan Yuan , Han Wu , Jian Huang , Hao Yu","doi":"10.1016/j.transproceed.2024.12.028","DOIUrl":null,"url":null,"abstract":"<div><div>This study evaluated the capability of three AI chatbots—ChatGPT 4.0, Claude 3.0, and Gemini Pro, as well as Google—in responding to common postkidney transplantation inquiries. We compiled a list of frequently asked postkidney transplant questions using Google and Bing. Response quality was rated on a 5-point Likert scale, while understandability and actionability were measured with the Patient Education Materials Assessment Tool (PEMAT). Readability was assessed using the Flesch Reading Ease and Flesch-Kincaid Grade Level metrics, with statistical analysis conducted via non-parametric tests, specifically the Kruskal-Wallis test, using SPSS. We gathered 127 questions, which were addressed by the chatbots and Google. The responses were of high quality (median Likert score: 4 [4,5]), good understandability (median PEMAT understandability score: 72.7% [62.5,77.8]), but poor actionability (median PEMAT operability score: 20% [0%-20%]). The readability was challenging (median Flesch Reading Ease score: 22.1 [8.7,34.8]), with a Flesch-Kincaid Grade Level akin to undergraduate-level text (median score: 14.7 [12.3,16.7]). Among the chatbots, Claude 3.0 provided the most reliable responses, though they required a higher reading level. ChatGPT 4.0 offered the most comprehensible responses. Moreover, Google did not outperform the chatbots in any of the scoring metrics.</div></div>","PeriodicalId":23246,"journal":{"name":"Transplantation proceedings","volume":"57 2","pages":"Pages 394-405"},"PeriodicalIF":0.8000,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating AI Chatbot Responses to Postkidney Transplant Inquiries\",\"authors\":\"Yihua Zhan , Xutao Chen , Feihong Ye , Zhikai Wu , Muhammad Usman , Zhihan Yuan , Han Wu , Jian Huang , Hao Yu\",\"doi\":\"10.1016/j.transproceed.2024.12.028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study evaluated the capability of three AI chatbots—ChatGPT 4.0, Claude 3.0, and Gemini Pro, as well as Google—in responding to common postkidney transplantation inquiries. We compiled a list of frequently asked postkidney transplant questions using Google and Bing. Response quality was rated on a 5-point Likert scale, while understandability and actionability were measured with the Patient Education Materials Assessment Tool (PEMAT). Readability was assessed using the Flesch Reading Ease and Flesch-Kincaid Grade Level metrics, with statistical analysis conducted via non-parametric tests, specifically the Kruskal-Wallis test, using SPSS. We gathered 127 questions, which were addressed by the chatbots and Google. The responses were of high quality (median Likert score: 4 [4,5]), good understandability (median PEMAT understandability score: 72.7% [62.5,77.8]), but poor actionability (median PEMAT operability score: 20% [0%-20%]). The readability was challenging (median Flesch Reading Ease score: 22.1 [8.7,34.8]), with a Flesch-Kincaid Grade Level akin to undergraduate-level text (median score: 14.7 [12.3,16.7]). Among the chatbots, Claude 3.0 provided the most reliable responses, though they required a higher reading level. ChatGPT 4.0 offered the most comprehensible responses. Moreover, Google did not outperform the chatbots in any of the scoring metrics.</div></div>\",\"PeriodicalId\":23246,\"journal\":{\"name\":\"Transplantation proceedings\",\"volume\":\"57 2\",\"pages\":\"Pages 394-405\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2025-01-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transplantation proceedings\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0041134524006821\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"IMMUNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transplantation proceedings","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0041134524006821","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"IMMUNOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

本研究评估了三种人工智能聊天机器人——chatgpt 4.0、Claude 3.0和Gemini Pro以及google——在回应常见肾移植后询问方面的能力。我们使用谷歌和必应（Bing）编制了一份肾移植后常见问题清单。反应质量采用5分李克特量表进行评分，可理解性和可操作性采用患者教育材料评估工具（PEMAT）进行测量。可读性采用Flesch Reading Ease和Flesch- kincaid Grade Level指标进行评估，统计分析采用非参数检验，特别是使用SPSS进行Kruskal-Wallis检验。我们收集了127个问题，由聊天机器人和b谷歌回答。反应质量高（Likert中位评分：4[4,5]），可理解性好（PEMAT可理解性中位评分：72.7%[62.5,77.8]），但可操作性差（PEMAT可操作性中位评分：20%[0%-20%]）。可读性具有挑战性（Flesch Reading Ease得分中位数：22.1 [8.7,34.8]），Flesch- kincaid Grade Level与本科水平的文本相似（得分中位数：14.7[12.3,16.7]）。在聊天机器人中，Claude 3.0提供了最可靠的回答，尽管它们需要更高的阅读水平。ChatGPT 4.0提供了最容易理解的回答。此外，谷歌在任何得分指标上都没有超过聊天机器人。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating AI Chatbot Responses to Postkidney Transplant Inquiries

This study evaluated the capability of three AI chatbots—ChatGPT 4.0, Claude 3.0, and Gemini Pro, as well as Google—in responding to common postkidney transplantation inquiries. We compiled a list of frequently asked postkidney transplant questions using Google and Bing. Response quality was rated on a 5-point Likert scale, while understandability and actionability were measured with the Patient Education Materials Assessment Tool (PEMAT). Readability was assessed using the Flesch Reading Ease and Flesch-Kincaid Grade Level metrics, with statistical analysis conducted via non-parametric tests, specifically the Kruskal-Wallis test, using SPSS. We gathered 127 questions, which were addressed by the chatbots and Google. The responses were of high quality (median Likert score: 4 [4,5]), good understandability (median PEMAT understandability score: 72.7% [62.5,77.8]), but poor actionability (median PEMAT operability score: 20% [0%-20%]). The readability was challenging (median Flesch Reading Ease score: 22.1 [8.7,34.8]), with a Flesch-Kincaid Grade Level akin to undergraduate-level text (median score: 14.7 [12.3,16.7]). Among the chatbots, Claude 3.0 provided the most reliable responses, though they required a higher reading level. ChatGPT 4.0 offered the most comprehensible responses. Moreover, Google did not outperform the chatbots in any of the scoring metrics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transplantation proceedings 医学-免疫学

CiteScore

1.70

自引率

0.00%

发文量

502

审稿时长

60 days

期刊介绍： Transplantation Proceedings publishes several different categories of manuscripts, all of which undergo extensive peer review by recognized authorities in the field prior to their acceptance for publication. The first type of manuscripts consists of sets of papers providing an in-depth expression of the current state of the art in various rapidly developing components of world transplantation biology and medicine. These manuscripts emanate from congresses of the affiliated transplantation societies, from Symposia sponsored by the Societies, as well as special Conferences and Workshops covering related topics. Transplantation Proceedings also publishes several special sections including publication of Clinical Transplantation Proceedings, being rapid original contributions of preclinical and clinical experiences. These manuscripts undergo review by members of the Editorial Board. Original basic or clinical science articles, clinical trials and case studies can be submitted to the journal?s open access companion title Transplantation Reports.