一项横断面研究,评估 ChatGPT 和 ChatSonic 对患者有关癫痫的询问所产生的反应

Aditya Kumar Gudimella Tirumala , Shubham Mishra , Nritya Trivedi , Divya Shivakumar , Aradhya Singh , Sanobar Shariff
{"title":"一项横断面研究,评估 ChatGPT 和 ChatSonic 对患者有关癫痫的询问所产生的反应","authors":"Aditya Kumar Gudimella Tirumala ,&nbsp;Shubham Mishra ,&nbsp;Nritya Trivedi ,&nbsp;Divya Shivakumar ,&nbsp;Aradhya Singh ,&nbsp;Sanobar Shariff","doi":"10.1016/j.teler.2023.100110","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><p>This article presents a study comparing the responses of two AI chatbots, ChatGPT and ChatSonic, regarding inquiries about epilepsy. Overall, ChatGPT and ChatSonic are very similar in terms of their capabilities and limitations and they are the most widely used AI software. However, there are some key differences, such as their training data, supported languages, and pricing model. The study aims to assess the potential application of AI in patient counseling and decision-making regarding epilepsy treatment.</p></div><div><h3>Methods</h3><p>The study categorized the inquiries of patients about epilepsy into two groups: patient counseling and judgment. Ten questions were formulated within these categories. Two specialized doctors evaluated the reliability and accuracy of the chatbot replies using the Global Quality Scale (GQS) and a modified version of the DISCERN score.</p></div><div><h3>Results</h3><p>The median value for GQS of 4.5 was given by Evaluator JC, and a median value for GQS of 4.0 was given by Evaluator VV. Furthermore, a median for RS of 5.0 was given by Evaluator JC, and a median for RS of 4.0 was given by Evaluator VV. The GQS data from Evaluators JC and VV have a Spearman correlation coefficient of -0.531, indicating an inversely proportional association, and a p-value of 0.016, indicating a statistically significant relationship between the variables. However, the correlation coefficient of RS between data by Evaluator JC and Evaluator VV is 0.368 which indicates the correlation is a directly proportional relationship, with a p-value of 0.110 which is not statistically significant, does not establish a relation between the variables. Weighted Kappa was used to study the agreement between the data. With a weighted kappa value of -0.318 and a 95 %CI of -0.570, -0.065 was obtained for GQS. This can help reject the null hypothesis indicating that the values by the Evaluator JC and Evaluator VV are statistically significant and has a negative agreement. However, a weighted kappa value of 0.1327 with a 95 %CI of -0.093, 0.359 obtained for RS, fails to reject the null hypothesis indicating that the values by the Evaluator JC and Evaluator VV are not significant and no agreement exists between the Evaluators. The results of this study suggest that both ChatGPT and ChatSonic have the potential to be valuable tools for epilepsy patients and their healthcare providers. However, it is important to note that the two evaluators had better agreement on the GQS scores than on the RS scores, suggesting that the GQS may be a more reliable measure of the quality of chatbot responses.</p></div><div><h3>Conclusion</h3><p>The findings underscore the importance of collaboration among policymakers, healthcare professionals, and AI designers to ensure appropriate and safe utilization of AI chatbots in the healthcare domain. While AI chatbots can provide valuable information, it is crucial to acknowledge their limitations, including reliance on the training data and occasional factual errors. The study highlights the need for further testing and validation of AI language models in the management of epilepsy as it concludes.</p></div>","PeriodicalId":101213,"journal":{"name":"Telematics and Informatics Reports","volume":"13 ","pages":"Article 100110"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772503023000701/pdfft?md5=312d1d7c1a900e46f2283b1a8508325f&pid=1-s2.0-S2772503023000701-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A cross-sectional study to assess response generated by ChatGPT and ChatSonic to patient queries about Epilepsy\",\"authors\":\"Aditya Kumar Gudimella Tirumala ,&nbsp;Shubham Mishra ,&nbsp;Nritya Trivedi ,&nbsp;Divya Shivakumar ,&nbsp;Aradhya Singh ,&nbsp;Sanobar Shariff\",\"doi\":\"10.1016/j.teler.2023.100110\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><p>This article presents a study comparing the responses of two AI chatbots, ChatGPT and ChatSonic, regarding inquiries about epilepsy. Overall, ChatGPT and ChatSonic are very similar in terms of their capabilities and limitations and they are the most widely used AI software. However, there are some key differences, such as their training data, supported languages, and pricing model. The study aims to assess the potential application of AI in patient counseling and decision-making regarding epilepsy treatment.</p></div><div><h3>Methods</h3><p>The study categorized the inquiries of patients about epilepsy into two groups: patient counseling and judgment. Ten questions were formulated within these categories. Two specialized doctors evaluated the reliability and accuracy of the chatbot replies using the Global Quality Scale (GQS) and a modified version of the DISCERN score.</p></div><div><h3>Results</h3><p>The median value for GQS of 4.5 was given by Evaluator JC, and a median value for GQS of 4.0 was given by Evaluator VV. Furthermore, a median for RS of 5.0 was given by Evaluator JC, and a median for RS of 4.0 was given by Evaluator VV. The GQS data from Evaluators JC and VV have a Spearman correlation coefficient of -0.531, indicating an inversely proportional association, and a p-value of 0.016, indicating a statistically significant relationship between the variables. However, the correlation coefficient of RS between data by Evaluator JC and Evaluator VV is 0.368 which indicates the correlation is a directly proportional relationship, with a p-value of 0.110 which is not statistically significant, does not establish a relation between the variables. Weighted Kappa was used to study the agreement between the data. With a weighted kappa value of -0.318 and a 95 %CI of -0.570, -0.065 was obtained for GQS. This can help reject the null hypothesis indicating that the values by the Evaluator JC and Evaluator VV are statistically significant and has a negative agreement. However, a weighted kappa value of 0.1327 with a 95 %CI of -0.093, 0.359 obtained for RS, fails to reject the null hypothesis indicating that the values by the Evaluator JC and Evaluator VV are not significant and no agreement exists between the Evaluators. The results of this study suggest that both ChatGPT and ChatSonic have the potential to be valuable tools for epilepsy patients and their healthcare providers. However, it is important to note that the two evaluators had better agreement on the GQS scores than on the RS scores, suggesting that the GQS may be a more reliable measure of the quality of chatbot responses.</p></div><div><h3>Conclusion</h3><p>The findings underscore the importance of collaboration among policymakers, healthcare professionals, and AI designers to ensure appropriate and safe utilization of AI chatbots in the healthcare domain. While AI chatbots can provide valuable information, it is crucial to acknowledge their limitations, including reliance on the training data and occasional factual errors. The study highlights the need for further testing and validation of AI language models in the management of epilepsy as it concludes.</p></div>\",\"PeriodicalId\":101213,\"journal\":{\"name\":\"Telematics and Informatics Reports\",\"volume\":\"13 \",\"pages\":\"Article 100110\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2772503023000701/pdfft?md5=312d1d7c1a900e46f2283b1a8508325f&pid=1-s2.0-S2772503023000701-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Telematics and Informatics Reports\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772503023000701\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Telematics and Informatics Reports","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772503023000701","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

目的 本文介绍了一项研究,比较了 ChatGPT 和 ChatSonic 这两个人工智能聊天机器人对有关癫痫的询问的回答。总的来说,ChatGPT 和 ChatSonic 在功能和局限性方面非常相似,也是使用最广泛的人工智能软件。不过,它们之间也存在一些关键差异,如训练数据、支持的语言和定价模式。本研究旨在评估人工智能在患者咨询和癫痫治疗决策中的潜在应用。方法本研究将患者有关癫痫的咨询分为两类:患者咨询和判断。在这些类别中,共制定了 10 个问题。结果评估者 JC 给出的 GQS 中值为 4.5,评估者 VV 给出的 GQS 中值为 4.0。此外,JC 评估员给出的 RS 中值为 5.0,VV 评估员给出的 RS 中值为 4.0。评估员 JC 和 VV 的 GQS 数据的斯皮尔曼相关系数为 -0.531,表明存在反比关系,P 值为 0.016,表明变量之间存在显著的统计关系。然而,评估者 JC 和评估者 VV 的数据之间的 RS 相关系数为 0.368,表明相关关系为正比关系,P 值为 0.110,在统计学上不显著,不能确定变量之间的关系。加权卡帕用于研究数据之间的一致性。加权卡帕值为-0.318,95%CI 为-0.570,GQS 为-0.065。这有助于拒绝零假设,表明评估员 JC 和评估员 VV 的值在统计上具有显著性,并且具有负一致性。然而,RS 的加权卡帕值为 0.1327,95 %CI 为-0.093,0.359,未能拒绝零假设,表明评估者 JC 和评估者 VV 的值不显著,评估者之间不存在一致性。本研究的结果表明,ChatGPT 和 ChatSonic 有可能成为癫痫患者及其医疗服务提供者的宝贵工具。但是,值得注意的是,两位评估者在 GQS 分数上的一致性要好于 RS 分数,这表明 GQS 可能是衡量聊天机器人回复质量的更可靠的标准。 结论研究结果强调了政策制定者、医疗保健专业人员和人工智能设计者之间合作的重要性,以确保在医疗保健领域适当、安全地使用人工智能聊天机器人。虽然人工智能聊天机器人可以提供有价值的信息,但必须承认其局限性,包括对训练数据的依赖性和偶尔出现的事实错误。该研究在结论中强调了在癫痫管理中进一步测试和验证人工智能语言模型的必要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A cross-sectional study to assess response generated by ChatGPT and ChatSonic to patient queries about Epilepsy

Objective

This article presents a study comparing the responses of two AI chatbots, ChatGPT and ChatSonic, regarding inquiries about epilepsy. Overall, ChatGPT and ChatSonic are very similar in terms of their capabilities and limitations and they are the most widely used AI software. However, there are some key differences, such as their training data, supported languages, and pricing model. The study aims to assess the potential application of AI in patient counseling and decision-making regarding epilepsy treatment.

Methods

The study categorized the inquiries of patients about epilepsy into two groups: patient counseling and judgment. Ten questions were formulated within these categories. Two specialized doctors evaluated the reliability and accuracy of the chatbot replies using the Global Quality Scale (GQS) and a modified version of the DISCERN score.

Results

The median value for GQS of 4.5 was given by Evaluator JC, and a median value for GQS of 4.0 was given by Evaluator VV. Furthermore, a median for RS of 5.0 was given by Evaluator JC, and a median for RS of 4.0 was given by Evaluator VV. The GQS data from Evaluators JC and VV have a Spearman correlation coefficient of -0.531, indicating an inversely proportional association, and a p-value of 0.016, indicating a statistically significant relationship between the variables. However, the correlation coefficient of RS between data by Evaluator JC and Evaluator VV is 0.368 which indicates the correlation is a directly proportional relationship, with a p-value of 0.110 which is not statistically significant, does not establish a relation between the variables. Weighted Kappa was used to study the agreement between the data. With a weighted kappa value of -0.318 and a 95 %CI of -0.570, -0.065 was obtained for GQS. This can help reject the null hypothesis indicating that the values by the Evaluator JC and Evaluator VV are statistically significant and has a negative agreement. However, a weighted kappa value of 0.1327 with a 95 %CI of -0.093, 0.359 obtained for RS, fails to reject the null hypothesis indicating that the values by the Evaluator JC and Evaluator VV are not significant and no agreement exists between the Evaluators. The results of this study suggest that both ChatGPT and ChatSonic have the potential to be valuable tools for epilepsy patients and their healthcare providers. However, it is important to note that the two evaluators had better agreement on the GQS scores than on the RS scores, suggesting that the GQS may be a more reliable measure of the quality of chatbot responses.

Conclusion

The findings underscore the importance of collaboration among policymakers, healthcare professionals, and AI designers to ensure appropriate and safe utilization of AI chatbots in the healthcare domain. While AI chatbots can provide valuable information, it is crucial to acknowledge their limitations, including reliance on the training data and occasional factual errors. The study highlights the need for further testing and validation of AI language models in the management of epilepsy as it concludes.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.90
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信