A Comparison of ChatGPT and Human Questionnaire Evaluations of the Urological Cancer Videos Most Watched on YouTube

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS
Aykut Demirci
{"title":"A Comparison of ChatGPT and Human Questionnaire Evaluations of the Urological Cancer Videos Most Watched on YouTube","authors":"Aykut Demirci","doi":"10.1016/j.clgc.2024.102145","DOIUrl":null,"url":null,"abstract":"<div><h3>Aim</h3><p>To examine the reliability of ChatGPT in evaluating the quality of medical content of the most watched videos related to urological cancers on YouTube.</p></div><div><h3>Material and methods</h3><p>In March 2024 a playlist was created of the first 20 videos watched on YouTube for each type of urological cancer. The video texts were evaluated by ChatGPT and by a urology specialist using the DISCERN-5 and Global Quality Scale (GQS) questionnaires. The results obtained were compared using the Kruskal-Wallis test.</p></div><div><h3>Results</h3><p>For the prostate, bladder, renal, and testicular cancer videos, the median (IQR) DISCERN-5 scores given by the human evaluator and ChatGPT were (Human: 4 [1], 3 [0], 3 [2], 3 [1], <em>P</em> = .11; ChatGPT: 3 [1.75], 3 [1], 3 [2], 3 [0], <em>P</em> = .4, respectively) and the GQS scores were (Human: 4 [1.75], 3 [0.75], 3.5 [2], 3.5 [1], <em>P</em> = .12; ChatGPT: 4 [1], 3 [0.75], 3 [1], 3.5 [1], <em>P</em> = .1, respectively), with no significant difference determined between the scores. The repeatability of the ChatGPT responses was determined to be similar at 25 % for prostate cancer, 30 % for bladder cancer, 30 % for renal cancer, and 35 % for testicular cancer (<em>P</em> = .92). No statistically significant difference was determined between the median (IQR) DISCERN-5 and GQS scores given by humans and ChatGPT for the content of videos about prostate, bladder, renal, and testicular cancer (<em>P</em> &gt; .05)<strong>.</strong></p></div><div><h3>Conclusion</h3><p>Although ChatGPT is successful in evaluating the medical quality of video texts, the results should be evaluated with caution as the repeatability of the results is low.</p></div>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1558767324001162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0

Abstract

Aim

To examine the reliability of ChatGPT in evaluating the quality of medical content of the most watched videos related to urological cancers on YouTube.

Material and methods

In March 2024 a playlist was created of the first 20 videos watched on YouTube for each type of urological cancer. The video texts were evaluated by ChatGPT and by a urology specialist using the DISCERN-5 and Global Quality Scale (GQS) questionnaires. The results obtained were compared using the Kruskal-Wallis test.

Results

For the prostate, bladder, renal, and testicular cancer videos, the median (IQR) DISCERN-5 scores given by the human evaluator and ChatGPT were (Human: 4 [1], 3 [0], 3 [2], 3 [1], P = .11; ChatGPT: 3 [1.75], 3 [1], 3 [2], 3 [0], P = .4, respectively) and the GQS scores were (Human: 4 [1.75], 3 [0.75], 3.5 [2], 3.5 [1], P = .12; ChatGPT: 4 [1], 3 [0.75], 3 [1], 3.5 [1], P = .1, respectively), with no significant difference determined between the scores. The repeatability of the ChatGPT responses was determined to be similar at 25 % for prostate cancer, 30 % for bladder cancer, 30 % for renal cancer, and 35 % for testicular cancer (P = .92). No statistically significant difference was determined between the median (IQR) DISCERN-5 and GQS scores given by humans and ChatGPT for the content of videos about prostate, bladder, renal, and testicular cancer (P > .05).

Conclusion

Although ChatGPT is successful in evaluating the medical quality of video texts, the results should be evaluated with caution as the repeatability of the results is low.

对 YouTube 上观看人数最多的泌尿系统癌症视频进行的 ChatGPT 和人类问卷评估比较
材料和方法 2024 年 3 月,我们创建了一个播放列表,其中包含 YouTube 上每种泌尿系统癌症的前 20 个视频。视频文本由 ChatGPT 和一名泌尿科专家使用 DISCERN-5 和全球质量量表 (GQS) 问卷进行评估。结果对于前列腺癌、膀胱癌、肾癌和睾丸癌视频,人类评估员和 ChatGPT 给出的 DISCERN-5 分数中位数(IQR)分别为(人类:4 [1],3 [0],3 [2],3 [1],P = .11;ChatGPT:分别为 3 [1.75]、3 [1]、3 [2]、3 [0],P = .4),GQS 分数为(人类:分别为 4 [1.75]、3 [0.75]、3.5 [2]、3.5 [1],P = .12;ChatGPT:分别为 4 [1]、3 [0.75]、3 [1]、3.5 [1],P = .1),各分数之间无显著差异。经测定,ChatGPT 反应的重复性相似,前列腺癌为 25%,膀胱癌为 30%,肾癌为 30%,睾丸癌为 35%(P = .92)。人类和 ChatGPT 对前列腺癌、膀胱癌、肾癌和睾丸癌视频内容给出的 DISCERN-5 和 GQS 分数的中位数(IQR)差异无统计学意义(P >.05)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信