{"title":"自动生成用于共享决策制定的比较表:比较人工生成的表(Option Grid)、搜索引擎流程和来自四个大型语言模型的输出","authors":"Padhraig Ryan , Glyn Elwyn","doi":"10.1016/j.pec.2025.109356","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>To explore the ability of artificial intelligence to produce comparison tables to facilitate shared decision-making.</div></div><div><h3>Methods</h3><div>An expert human-generated comparison table (Option Grid ™) was compared to four comparison tables produced by large language models and one produced using a Google search process that a patient might undertake. Each table was prepared for a patient with osteoarthritis of the knee, considering a knee replacement. The results were compared to the Option Grid.™</div><div>The information items in each comparison table were divided into eight categories: the intervention process; benefits; side effects & adverse effects; pre-operative care; post-operative care & physical recovery; repeat surgery; decision-making process; and alternative interventions. We assessed the accuracy of each information item in a binary manner (accurate, inaccurate).</div></div><div><h3>Results</h3><div>OpenBioLLM-70b and two proprietary ChatGPT models generated similar frequencies of information items across most categories, but omitted information on alternative interventions. The Google search process yielded the highest number of information items (n = 41), and OpenBioLLM-8b yielded the lowest (n = 20). Accuracy, compared to the human Option Grid, was 97 % for the ChatGPT models and the open-source OpenBioLLM-70b, and 95 % for OpenBioLLM-8b and the Google search process. The human-generated Option Grid had superior readability.</div></div><div><h3>Conclusions</h3><div>Large language models produced comparison tables that are 3–5 % less accurate than a human generated Option Grid. Comparison tables produced by large language models may be less readable and require additional checking and editing.</div></div><div><h3>Practice implications</h3><div>Subject to fact-checking and feedback, large language models may have a role to play in scaling up the production of evidence-based comparison tables that could assist patients and others.</div></div>","PeriodicalId":49714,"journal":{"name":"Patient Education and Counseling","volume":"142 ","pages":"Article 109356"},"PeriodicalIF":3.1000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automated production of comparison tables for shared decision making: Comparing a human-generated table (Option Grid), a search engine process, and outputs from four large language models\",\"authors\":\"Padhraig Ryan , Glyn Elwyn\",\"doi\":\"10.1016/j.pec.2025.109356\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objectives</h3><div>To explore the ability of artificial intelligence to produce comparison tables to facilitate shared decision-making.</div></div><div><h3>Methods</h3><div>An expert human-generated comparison table (Option Grid ™) was compared to four comparison tables produced by large language models and one produced using a Google search process that a patient might undertake. Each table was prepared for a patient with osteoarthritis of the knee, considering a knee replacement. The results were compared to the Option Grid.™</div><div>The information items in each comparison table were divided into eight categories: the intervention process; benefits; side effects & adverse effects; pre-operative care; post-operative care & physical recovery; repeat surgery; decision-making process; and alternative interventions. We assessed the accuracy of each information item in a binary manner (accurate, inaccurate).</div></div><div><h3>Results</h3><div>OpenBioLLM-70b and two proprietary ChatGPT models generated similar frequencies of information items across most categories, but omitted information on alternative interventions. The Google search process yielded the highest number of information items (n = 41), and OpenBioLLM-8b yielded the lowest (n = 20). Accuracy, compared to the human Option Grid, was 97 % for the ChatGPT models and the open-source OpenBioLLM-70b, and 95 % for OpenBioLLM-8b and the Google search process. The human-generated Option Grid had superior readability.</div></div><div><h3>Conclusions</h3><div>Large language models produced comparison tables that are 3–5 % less accurate than a human generated Option Grid. Comparison tables produced by large language models may be less readable and require additional checking and editing.</div></div><div><h3>Practice implications</h3><div>Subject to fact-checking and feedback, large language models may have a role to play in scaling up the production of evidence-based comparison tables that could assist patients and others.</div></div>\",\"PeriodicalId\":49714,\"journal\":{\"name\":\"Patient Education and Counseling\",\"volume\":\"142 \",\"pages\":\"Article 109356\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Patient Education and Counseling\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0738399125007232\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Patient Education and Counseling","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0738399125007232","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
Automated production of comparison tables for shared decision making: Comparing a human-generated table (Option Grid), a search engine process, and outputs from four large language models
Objectives
To explore the ability of artificial intelligence to produce comparison tables to facilitate shared decision-making.
Methods
An expert human-generated comparison table (Option Grid ™) was compared to four comparison tables produced by large language models and one produced using a Google search process that a patient might undertake. Each table was prepared for a patient with osteoarthritis of the knee, considering a knee replacement. The results were compared to the Option Grid.™
The information items in each comparison table were divided into eight categories: the intervention process; benefits; side effects & adverse effects; pre-operative care; post-operative care & physical recovery; repeat surgery; decision-making process; and alternative interventions. We assessed the accuracy of each information item in a binary manner (accurate, inaccurate).
Results
OpenBioLLM-70b and two proprietary ChatGPT models generated similar frequencies of information items across most categories, but omitted information on alternative interventions. The Google search process yielded the highest number of information items (n = 41), and OpenBioLLM-8b yielded the lowest (n = 20). Accuracy, compared to the human Option Grid, was 97 % for the ChatGPT models and the open-source OpenBioLLM-70b, and 95 % for OpenBioLLM-8b and the Google search process. The human-generated Option Grid had superior readability.
Conclusions
Large language models produced comparison tables that are 3–5 % less accurate than a human generated Option Grid. Comparison tables produced by large language models may be less readable and require additional checking and editing.
Practice implications
Subject to fact-checking and feedback, large language models may have a role to play in scaling up the production of evidence-based comparison tables that could assist patients and others.
期刊介绍:
Patient Education and Counseling is an interdisciplinary, international journal for patient education and health promotion researchers, managers and clinicians. The journal seeks to explore and elucidate the educational, counseling and communication models in health care. Its aim is to provide a forum for fundamental as well as applied research, and to promote the study of organizational issues involved with the delivery of patient education, counseling, health promotion services and training models in improving communication between providers and patients.