A Comparison of ChatGPT and Human Questionnaire Evaluations of the Urological Cancer Videos Most Watched on YouTube

IF 2.3 3区医学 Q3 ONCOLOGY

Clinical genitourinary cancer Pub Date : 2024-06-29 DOI:10.1016/j.clgc.2024.102145

Aykut Demirci

{"title":"A Comparison of ChatGPT and Human Questionnaire Evaluations of the Urological Cancer Videos Most Watched on YouTube","authors":"Aykut Demirci","doi":"10.1016/j.clgc.2024.102145","DOIUrl":null,"url":null,"abstract":"<div><h3>Aim</h3><p>To examine the reliability of ChatGPT in evaluating the quality of medical content of the most watched videos related to urological cancers on YouTube.</p></div><div><h3>Material and methods</h3><p>In March 2024 a playlist was created of the first 20 videos watched on YouTube for each type of urological cancer. The video texts were evaluated by ChatGPT and by a urology specialist using the DISCERN-5 and Global Quality Scale (GQS) questionnaires. The results obtained were compared using the Kruskal-Wallis test.</p></div><div><h3>Results</h3><p>For the prostate, bladder, renal, and testicular cancer videos, the median (IQR) DISCERN-5 scores given by the human evaluator and ChatGPT were (Human: 4 [1], 3 [0], 3 [2], 3 [1], <em>P</em> = .11; ChatGPT: 3 [1.75], 3 [1], 3 [2], 3 [0], <em>P</em> = .4, respectively) and the GQS scores were (Human: 4 [1.75], 3 [0.75], 3.5 [2], 3.5 [1], <em>P</em> = .12; ChatGPT: 4 [1], 3 [0.75], 3 [1], 3.5 [1], <em>P</em> = .1, respectively), with no significant difference determined between the scores. The repeatability of the ChatGPT responses was determined to be similar at 25 % for prostate cancer, 30 % for bladder cancer, 30 % for renal cancer, and 35 % for testicular cancer (<em>P</em> = .92). No statistically significant difference was determined between the median (IQR) DISCERN-5 and GQS scores given by humans and ChatGPT for the content of videos about prostate, bladder, renal, and testicular cancer (<em>P</em> > .05)<strong>.</strong></p></div><div><h3>Conclusion</h3><p>Although ChatGPT is successful in evaluating the medical quality of video texts, the results should be evaluated with caution as the repeatability of the results is low.</p></div>","PeriodicalId":10380,"journal":{"name":"Clinical genitourinary cancer","volume":"22 5","pages":"Article 102145"},"PeriodicalIF":2.3000,"publicationDate":"2024-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical genitourinary cancer","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1558767324001162","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Aim

To examine the reliability of ChatGPT in evaluating the quality of medical content of the most watched videos related to urological cancers on YouTube.

Material and methods

In March 2024 a playlist was created of the first 20 videos watched on YouTube for each type of urological cancer. The video texts were evaluated by ChatGPT and by a urology specialist using the DISCERN-5 and Global Quality Scale (GQS) questionnaires. The results obtained were compared using the Kruskal-Wallis test.

Results

For the prostate, bladder, renal, and testicular cancer videos, the median (IQR) DISCERN-5 scores given by the human evaluator and ChatGPT were (Human: 4 [1], 3 [0], 3 [2], 3 [1], P = .11; ChatGPT: 3 [1.75], 3 [1], 3 [2], 3 [0], P = .4, respectively) and the GQS scores were (Human: 4 [1.75], 3 [0.75], 3.5 [2], 3.5 [1], P = .12; ChatGPT: 4 [1], 3 [0.75], 3 [1], 3.5 [1], P = .1, respectively), with no significant difference determined between the scores. The repeatability of the ChatGPT responses was determined to be similar at 25 % for prostate cancer, 30 % for bladder cancer, 30 % for renal cancer, and 35 % for testicular cancer (P = .92). No statistically significant difference was determined between the median (IQR) DISCERN-5 and GQS scores given by humans and ChatGPT for the content of videos about prostate, bladder, renal, and testicular cancer (P > .05).

Conclusion

Although ChatGPT is successful in evaluating the medical quality of video texts, the results should be evaluated with caution as the repeatability of the results is low.

查看原文本刊更多论文

对 YouTube 上观看人数最多的泌尿系统癌症视频进行的 ChatGPT 和人类问卷评估比较

材料和方法 2024 年 3 月，我们创建了一个播放列表，其中包含 YouTube 上每种泌尿系统癌症的前 20 个视频。视频文本由 ChatGPT 和一名泌尿科专家使用 DISCERN-5 和全球质量量表 (GQS) 问卷进行评估。结果对于前列腺癌、膀胱癌、肾癌和睾丸癌视频，人类评估员和 ChatGPT 给出的 DISCERN-5 分数中位数（IQR）分别为（人类：4 [1]，3 [0]，3 [2]，3 [1]，P = .11；ChatGPT：分别为 3 [1.75]、3 [1]、3 [2]、3 [0]，P = .4），GQS 分数为（人类：分别为 4 [1.75]、3 [0.75]、3.5 [2]、3.5 [1]，P = .12；ChatGPT：分别为 4 [1]、3 [0.75]、3 [1]、3.5 [1]，P = .1），各分数之间无显著差异。经测定，ChatGPT 反应的重复性相似，前列腺癌为 25%，膀胱癌为 30%，肾癌为 30%，睾丸癌为 35%（P = .92）。人类和 ChatGPT 对前列腺癌、膀胱癌、肾癌和睾丸癌视频内容给出的 DISCERN-5 和 GQS 分数的中位数（IQR）差异无统计学意义（P >.05）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Clinical genitourinary cancer 医学-泌尿学与肾脏学

CiteScore

5.20

自引率

6.20%

发文量

201

审稿时长

54 days

期刊介绍： Clinical Genitourinary Cancer is a peer-reviewed journal that publishes original articles describing various aspects of clinical and translational research in genitourinary cancers. Clinical Genitourinary Cancer is devoted to articles on detection, diagnosis, prevention, and treatment of genitourinary cancers. The main emphasis is on recent scientific developments in all areas related to genitourinary malignancies. Specific areas of interest include clinical research and mechanistic approaches; drug sensitivity and resistance; gene and antisense therapy; pathology, markers, and prognostic indicators; chemoprevention strategies; multimodality therapy; and integration of various approaches.