Evaluation of the accuracy and quality of ChatGPT-4 responses for hyperparathyroidism patients discussed at multidisciplinary endocrinology meetings.

IF 2.9 3区 医学 Q2 HEALTH CARE SCIENCES & SERVICES
DIGITAL HEALTH Pub Date : 2024-08-28 eCollection Date: 2024-01-01 DOI:10.1177/20552076241278692
Işılay Taşkaldıran, Çağatay Emir Önder, Püren Gökbulut, Gönül Koç, Şerife Mehlika Kuşkonmaz
{"title":"Evaluation of the accuracy and quality of ChatGPT-4 responses for hyperparathyroidism patients discussed at multidisciplinary endocrinology meetings.","authors":"Işılay Taşkaldıran, Çağatay Emir Önder, Püren Gökbulut, Gönül Koç, Şerife Mehlika Kuşkonmaz","doi":"10.1177/20552076241278692","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Chat Generative Pre-trained Transformer (ChatGPT) is now utilized in various fields of healthcare in order to obtain answers to questions related to healthcare-related problems and to evaluate available information. Primary hyperparathyroidism is a common endocrine disorder. We aimed to evaluate the accuracy and quality of ChatGPT's responses to questions specific to hyperparathyroidism cases discussed at multidisciplinary endocrinology meetings.</p><p><strong>Methods: </strong>ChatGPT-4 was asked to respond to 10 hyperparathyroidism cases evaluated at multidisciplinary endocrinology meetings. The accuracy, completeness, and quality of the responses were scored independently by two endocrinologists. Accuracy and completeness were evaluated on the Likert scale, and quality was evaluated on the global quality scale (GQS).</p><p><strong>Results: </strong>No misleading information was detected in the responses. In terms of diagnosis, the mean accuracy scores (ranging from 1 to 5) were 4.9 ± 0.1 and the mean completeness scores (ranging from 1 to 3) were 3.0. In the responses given in terms of further examination, the mean accuracy and completeness scores were 4.8 ± 0.13 and 2.6 ± 0.16, respectively. The mean accuracy and completeness scores for treatment recommendations were 4.9 ± 0.1 and 2.4 ± 0.16, respectively. The GQS evaluation result was 80% high quality and 20% medium quality.</p><p><strong>Conclusion: </strong>In this study, the accuracy and quality rates of ChatGPT-4 were generally high in responding to questions as to hyperparathyroidism patients. It can be concluded that artificial intelligence may serve as a valuable tool in healthcare. However, the limitations and risks of ChatGPT should also be evaluated.</p>","PeriodicalId":51333,"journal":{"name":"DIGITAL HEALTH","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11363241/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DIGITAL HEALTH","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20552076241278692","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Chat Generative Pre-trained Transformer (ChatGPT) is now utilized in various fields of healthcare in order to obtain answers to questions related to healthcare-related problems and to evaluate available information. Primary hyperparathyroidism is a common endocrine disorder. We aimed to evaluate the accuracy and quality of ChatGPT's responses to questions specific to hyperparathyroidism cases discussed at multidisciplinary endocrinology meetings.

Methods: ChatGPT-4 was asked to respond to 10 hyperparathyroidism cases evaluated at multidisciplinary endocrinology meetings. The accuracy, completeness, and quality of the responses were scored independently by two endocrinologists. Accuracy and completeness were evaluated on the Likert scale, and quality was evaluated on the global quality scale (GQS).

Results: No misleading information was detected in the responses. In terms of diagnosis, the mean accuracy scores (ranging from 1 to 5) were 4.9 ± 0.1 and the mean completeness scores (ranging from 1 to 3) were 3.0. In the responses given in terms of further examination, the mean accuracy and completeness scores were 4.8 ± 0.13 and 2.6 ± 0.16, respectively. The mean accuracy and completeness scores for treatment recommendations were 4.9 ± 0.1 and 2.4 ± 0.16, respectively. The GQS evaluation result was 80% high quality and 20% medium quality.

Conclusion: In this study, the accuracy and quality rates of ChatGPT-4 were generally high in responding to questions as to hyperparathyroidism patients. It can be concluded that artificial intelligence may serve as a valuable tool in healthcare. However, the limitations and risks of ChatGPT should also be evaluated.

评估在多学科内分泌学会议上讨论的甲状旁腺功能亢进患者 ChatGPT-4 反应的准确性和质量。
目的:聊天生成预训练变换器(ChatGPT)目前已被用于医疗保健的各个领域,以获取与医疗保健相关问题的答案并评估可用信息。原发性甲状旁腺功能亢进症是一种常见的内分泌疾病。我们的目的是评估 ChatGPT 对多学科内分泌学会议上讨论的甲状旁腺功能亢进病例的具体问题所做回答的准确性和质量:方法:要求 ChatGPT-4 回答多学科内分泌学会议上评估的 10 个甲状旁腺功能亢进病例。由两名内分泌专家对回复的准确性、完整性和质量进行独立评分。准确性和完整性采用李克特量表进行评估,质量采用总体质量量表(GQS)进行评估:结果:在回答中没有发现误导性信息。在诊断方面,准确性的平均得分(从 1 到 5 分不等)为 4.9 ± 0.1,完整性的平均得分(从 1 到 3 分不等)为 3.0。在进一步检查方面,准确性和完整性的平均得分分别为 4.8 ± 0.13 和 2.6 ± 0.16。治疗建议的平均准确度和完整度分别为 4.9 ± 0.1 分和 2.4 ± 0.16 分。GQS评估结果为80%高质量,20%中等质量:在这项研究中,ChatGPT-4 在回答甲状旁腺功能亢进症患者的问题时,准确率和质量普遍较高。由此可以得出结论,人工智能可能会成为医疗保健领域的一种有价值的工具。不过,也应评估 ChatGPT 的局限性和风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
DIGITAL HEALTH
DIGITAL HEALTH Multiple-
CiteScore
2.90
自引率
7.70%
发文量
302
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信