自动生成用于共享决策制定的比较表:比较人工生成的表(Option Grid)、搜索引擎流程和来自四个大型语言模型的输出

IF 3.1 2区 医学 Q2 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Padhraig Ryan , Glyn Elwyn
{"title":"自动生成用于共享决策制定的比较表:比较人工生成的表(Option Grid)、搜索引擎流程和来自四个大型语言模型的输出","authors":"Padhraig Ryan ,&nbsp;Glyn Elwyn","doi":"10.1016/j.pec.2025.109356","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>To explore the ability of artificial intelligence to produce comparison tables to facilitate shared decision-making.</div></div><div><h3>Methods</h3><div>An expert human-generated comparison table (Option Grid ™) was compared to four comparison tables produced by large language models and one produced using a Google search process that a patient might undertake. Each table was prepared for a patient with osteoarthritis of the knee, considering a knee replacement. The results were compared to the Option Grid.™</div><div>The information items in each comparison table were divided into eight categories: the intervention process; benefits; side effects &amp; adverse effects; pre-operative care; post-operative care &amp; physical recovery; repeat surgery; decision-making process; and alternative interventions. We assessed the accuracy of each information item in a binary manner (accurate, inaccurate).</div></div><div><h3>Results</h3><div>OpenBioLLM-70b and two proprietary ChatGPT models generated similar frequencies of information items across most categories, but omitted information on alternative interventions. The Google search process yielded the highest number of information items (n = 41), and OpenBioLLM-8b yielded the lowest (n = 20). Accuracy, compared to the human Option Grid, was 97 % for the ChatGPT models and the open-source OpenBioLLM-70b, and 95 % for OpenBioLLM-8b and the Google search process. The human-generated Option Grid had superior readability.</div></div><div><h3>Conclusions</h3><div>Large language models produced comparison tables that are 3–5 % less accurate than a human generated Option Grid. Comparison tables produced by large language models may be less readable and require additional checking and editing.</div></div><div><h3>Practice implications</h3><div>Subject to fact-checking and feedback, large language models may have a role to play in scaling up the production of evidence-based comparison tables that could assist patients and others.</div></div>","PeriodicalId":49714,"journal":{"name":"Patient Education and Counseling","volume":"142 ","pages":"Article 109356"},"PeriodicalIF":3.1000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated production of comparison tables for shared decision making: Comparing a human-generated table (Option Grid), a search engine process, and outputs from four large language models\",\"authors\":\"Padhraig Ryan ,&nbsp;Glyn Elwyn\",\"doi\":\"10.1016/j.pec.2025.109356\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objectives</h3><div>To explore the ability of artificial intelligence to produce comparison tables to facilitate shared decision-making.</div></div><div><h3>Methods</h3><div>An expert human-generated comparison table (Option Grid ™) was compared to four comparison tables produced by large language models and one produced using a Google search process that a patient might undertake. Each table was prepared for a patient with osteoarthritis of the knee, considering a knee replacement. The results were compared to the Option Grid.™</div><div>The information items in each comparison table were divided into eight categories: the intervention process; benefits; side effects &amp; adverse effects; pre-operative care; post-operative care &amp; physical recovery; repeat surgery; decision-making process; and alternative interventions. We assessed the accuracy of each information item in a binary manner (accurate, inaccurate).</div></div><div><h3>Results</h3><div>OpenBioLLM-70b and two proprietary ChatGPT models generated similar frequencies of information items across most categories, but omitted information on alternative interventions. The Google search process yielded the highest number of information items (n = 41), and OpenBioLLM-8b yielded the lowest (n = 20). Accuracy, compared to the human Option Grid, was 97 % for the ChatGPT models and the open-source OpenBioLLM-70b, and 95 % for OpenBioLLM-8b and the Google search process. The human-generated Option Grid had superior readability.</div></div><div><h3>Conclusions</h3><div>Large language models produced comparison tables that are 3–5 % less accurate than a human generated Option Grid. Comparison tables produced by large language models may be less readable and require additional checking and editing.</div></div><div><h3>Practice implications</h3><div>Subject to fact-checking and feedback, large language models may have a role to play in scaling up the production of evidence-based comparison tables that could assist patients and others.</div></div>\",\"PeriodicalId\":49714,\"journal\":{\"name\":\"Patient Education and Counseling\",\"volume\":\"142 \",\"pages\":\"Article 109356\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Patient Education and Counseling\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0738399125007232\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Patient Education and Counseling","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0738399125007232","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

摘要

目的探讨人工智能生成比较表的能力,促进共同决策。方法将人工生成的专家比较表(Option Grid™)与由大型语言模型生成的四个比较表和使用患者可能进行的谷歌搜索过程生成的一个比较表进行比较。每张桌子都是为考虑膝关节置换术的膝关节骨关节炎患者准备的。结果与选项网格进行了比较。™每个对照表中的信息项分为八个类别:干预过程;福利;副作用&不良反应;术前护理;术后护理及身体恢复;重复手术;决策过程;以及其他干预措施。我们以二元方式评估每个信息项目的准确性(准确,不准确)。结果openbiollm -70b和两个专有的ChatGPT模型在大多数类别中产生相似的信息项目频率,但忽略了替代干预措施的信息。谷歌搜索过程产生的信息项数最多(n = 41),而OpenBioLLM-8b产生的信息项数最少(n = 20)。与人类选择网格相比,ChatGPT模型和开源OpenBioLLM-70b的准确率为97 %,OpenBioLLM-8b和谷歌搜索过程的准确率为95 %。人工生成的Option Grid具有更好的可读性。结论:大型语言模型产生的比较表的准确性比人类生成的选项网格低3-5 %。由大型语言模型生成的比较表可能可读性较差,并且需要额外的检查和编辑。实践意义根据事实核查和反馈,大型语言模型可能在扩大基于证据的比较表的生产方面发挥作用,可以帮助患者和其他人。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automated production of comparison tables for shared decision making: Comparing a human-generated table (Option Grid), a search engine process, and outputs from four large language models

Objectives

To explore the ability of artificial intelligence to produce comparison tables to facilitate shared decision-making.

Methods

An expert human-generated comparison table (Option Grid ™) was compared to four comparison tables produced by large language models and one produced using a Google search process that a patient might undertake. Each table was prepared for a patient with osteoarthritis of the knee, considering a knee replacement. The results were compared to the Option Grid.™
The information items in each comparison table were divided into eight categories: the intervention process; benefits; side effects & adverse effects; pre-operative care; post-operative care & physical recovery; repeat surgery; decision-making process; and alternative interventions. We assessed the accuracy of each information item in a binary manner (accurate, inaccurate).

Results

OpenBioLLM-70b and two proprietary ChatGPT models generated similar frequencies of information items across most categories, but omitted information on alternative interventions. The Google search process yielded the highest number of information items (n = 41), and OpenBioLLM-8b yielded the lowest (n = 20). Accuracy, compared to the human Option Grid, was 97 % for the ChatGPT models and the open-source OpenBioLLM-70b, and 95 % for OpenBioLLM-8b and the Google search process. The human-generated Option Grid had superior readability.

Conclusions

Large language models produced comparison tables that are 3–5 % less accurate than a human generated Option Grid. Comparison tables produced by large language models may be less readable and require additional checking and editing.

Practice implications

Subject to fact-checking and feedback, large language models may have a role to play in scaling up the production of evidence-based comparison tables that could assist patients and others.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Patient Education and Counseling
Patient Education and Counseling 医学-公共卫生、环境卫生与职业卫生
CiteScore
5.60
自引率
11.40%
发文量
384
审稿时长
46 days
期刊介绍: Patient Education and Counseling is an interdisciplinary, international journal for patient education and health promotion researchers, managers and clinicians. The journal seeks to explore and elucidate the educational, counseling and communication models in health care. Its aim is to provide a forum for fundamental as well as applied research, and to promote the study of organizational issues involved with the delivery of patient education, counseling, health promotion services and training models in improving communication between providers and patients.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信