Evaluating the Quality of Health Information: Comparison of Human and Artificial Intelligence.

IF 2.9 3区 医学 Q1 CLINICAL NEUROLOGY
Dhruva Arcot, Neha Pondicherry, Subhankar Chakraborty
{"title":"Evaluating the Quality of Health Information: Comparison of Human and Artificial Intelligence.","authors":"Dhruva Arcot, Neha Pondicherry, Subhankar Chakraborty","doi":"10.1111/nmo.70164","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Over half of all Americans seek health-related information online, yet the quality of this digital content remains largely unregulated and variable. The DISCERN score, a validated 15-item instrument, offers a structured method to assess the reliability of written health information. While expert-assigned DISCERN scores have been widely applied across various disease states, whether artificial intelligence (AI) can automate this evaluation remains unknown. Specifically, it is unclear whether AI-generated DISCERN scores align with those assigned by human experts. Our study seeks to investigate this gap in knowledge by examining the correlation between AI-generated and human-assigned DISCERN scores for TikTok videos on Irritable Bowel Syndrome (IBS).</p><p><strong>Methods: </strong>A set of 100 TikTok videos on IBS previously scored using DISCERN by two physicians was chosen. Sixty-nine videos contained transcribable spoken audio, which was processed using a free online transcription tool. The remaining videos either featured songs or music that were not suitable for transcription or were deleted or were not publicly available. The audio transcripts were prefixed with an identical prompt and submitted to two common AI models-ChatGPT 4.0 and Microsoft Copilot for-DISCERN score evaluation. The average DISCERN score for each transcript was compared between the AI models and with the mean of the DISCERN score given by the human reviewers using Pearson correlation (r) and Kruskal Wallis test.</p><p><strong>Results: </strong>There was a significant correlation between human and AI-generated DISCERN scores (r = 0.60-0.65). When categorized by the background of the content creators-medical (N = 26) versus non-medical (N = 43), the correlation was significant only for content made by non-medical content creators (r = 0.69-0.75, p < 0.001). Correlation between ChatGPT and Copilot DISCERN scores was stronger for videos by non-medical content creators (r = 0.66) than those by medical content creators (r = 0.43). On linear regression, ChatGPT's DISCERN scores explained 55.6% of the variation in human DISCERN scores for videos by non-medical creators, compared to 8.9% for videos by medical creators. For Copilot, the corresponding values were 47.2% and 9.3%.</p><p><strong>Conclusion: </strong>AI models demonstrated moderate alignment with human-assigned DISCERN scores for IBS-related TikTok videos, but only when content was produced by non-medical creators. The weaker correlation for content produced by those with a medical background suggests limitations in current AI models' ability to interpret nuanced or technical health information. These findings highlight the need for further validation across broader topics, languages, platforms, and reviewer pools. If refined, AI-generated DISCERN scoring could serve as a scalable tool to help users assess the reliability of health information on social media and curb misinformation.</p>","PeriodicalId":19123,"journal":{"name":"Neurogastroenterology and Motility","volume":" ","pages":"e70164"},"PeriodicalIF":2.9000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurogastroenterology and Motility","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/nmo.70164","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Over half of all Americans seek health-related information online, yet the quality of this digital content remains largely unregulated and variable. The DISCERN score, a validated 15-item instrument, offers a structured method to assess the reliability of written health information. While expert-assigned DISCERN scores have been widely applied across various disease states, whether artificial intelligence (AI) can automate this evaluation remains unknown. Specifically, it is unclear whether AI-generated DISCERN scores align with those assigned by human experts. Our study seeks to investigate this gap in knowledge by examining the correlation between AI-generated and human-assigned DISCERN scores for TikTok videos on Irritable Bowel Syndrome (IBS).

Methods: A set of 100 TikTok videos on IBS previously scored using DISCERN by two physicians was chosen. Sixty-nine videos contained transcribable spoken audio, which was processed using a free online transcription tool. The remaining videos either featured songs or music that were not suitable for transcription or were deleted or were not publicly available. The audio transcripts were prefixed with an identical prompt and submitted to two common AI models-ChatGPT 4.0 and Microsoft Copilot for-DISCERN score evaluation. The average DISCERN score for each transcript was compared between the AI models and with the mean of the DISCERN score given by the human reviewers using Pearson correlation (r) and Kruskal Wallis test.

Results: There was a significant correlation between human and AI-generated DISCERN scores (r = 0.60-0.65). When categorized by the background of the content creators-medical (N = 26) versus non-medical (N = 43), the correlation was significant only for content made by non-medical content creators (r = 0.69-0.75, p < 0.001). Correlation between ChatGPT and Copilot DISCERN scores was stronger for videos by non-medical content creators (r = 0.66) than those by medical content creators (r = 0.43). On linear regression, ChatGPT's DISCERN scores explained 55.6% of the variation in human DISCERN scores for videos by non-medical creators, compared to 8.9% for videos by medical creators. For Copilot, the corresponding values were 47.2% and 9.3%.

Conclusion: AI models demonstrated moderate alignment with human-assigned DISCERN scores for IBS-related TikTok videos, but only when content was produced by non-medical creators. The weaker correlation for content produced by those with a medical background suggests limitations in current AI models' ability to interpret nuanced or technical health information. These findings highlight the need for further validation across broader topics, languages, platforms, and reviewer pools. If refined, AI-generated DISCERN scoring could serve as a scalable tool to help users assess the reliability of health information on social media and curb misinformation.

评估健康信息的质量:人类和人工智能的比较。
背景:超过一半的美国人在网上寻找与健康相关的信息,然而这些数字内容的质量在很大程度上仍然不受监管和变化。辨别评分,一个经过验证的15项工具,提供了一个结构化的方法来评估书面健康信息的可靠性。虽然专家指定的DISCERN分数已广泛应用于各种疾病状态,但人工智能(AI)是否可以自动化这一评估仍然未知。具体来说,目前还不清楚人工智能生成的DISCERN分数是否与人类专家分配的分数一致。我们的研究试图通过检查人工智能生成的和人类分配的TikTok肠易激综合征(IBS)视频的DISCERN分数之间的相关性来调查这种知识差距。方法:选择一组由两名医生使用DISCERN评分的100个关于IBS的TikTok视频。69个视频包含可转录的语音,使用免费的在线转录工具进行处理。剩下的视频要么是不适合转录的歌曲或音乐,要么是被删除的,要么是不公开的。音频记录以相同的提示为前缀,并提交给两个常见的人工智能模型——chatgpt 4.0和Microsoft Copilot for discern评分评估。使用Pearson相关性(r)和Kruskal Wallis检验,将每个成绩单的平均DISCERN分数与人工审稿人给出的DISCERN分数的平均值进行比较。结果:人类和人工智能生成的DISCERN评分之间存在显著相关性(r = 0.60-0.65)。当根据内容创作者的背景——医疗(N = 26)与非医疗(N = 43)进行分类时,相关性仅对于非医疗内容创作者制作的内容是显著的(r = 0.69-0.75, p)。结论:人工智能模型显示出与人类分配的ibs相关TikTok视频的DISCERN分数适度一致,但仅当内容由非医疗创作者制作时。具有医学背景的人制作的内容的相关性较弱,表明当前人工智能模型解释细微差别或技术性健康信息的能力存在局限性。这些发现强调了在更广泛的主题、语言、平台和审稿人池中进行进一步验证的必要性。如果经过改进,人工智能生成的DISCERN评分可以作为一种可扩展的工具,帮助用户评估社交媒体上健康信息的可靠性,并遏制错误信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurogastroenterology and Motility
Neurogastroenterology and Motility 医学-临床神经学
CiteScore
7.80
自引率
8.60%
发文量
178
审稿时长
3-6 weeks
期刊介绍: Neurogastroenterology & Motility (NMO) is the official Journal of the European Society of Neurogastroenterology & Motility (ESNM) and the American Neurogastroenterology and Motility Society (ANMS). It is edited by James Galligan, Albert Bredenoord, and Stephen Vanner. The editorial and peer review process is independent of the societies affiliated to the journal and publisher: Neither the ANMS, the ESNM or the Publisher have editorial decision-making power. Whenever these are relevant to the content being considered or published, the editors, journal management committee and editorial board declare their interests and affiliations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信