Can LLMs Predict Patient Treatment Choices? A Discrete Choice Experiment Framework.

IF 6 2区 医学 Q1 ECONOMICS
Tina Cheng, Juan Marcos Gonzalez, Matthew M Engelhard, Shelby Reed, Semra Ozdemir
{"title":"Can LLMs Predict Patient Treatment Choices? A Discrete Choice Experiment Framework.","authors":"Tina Cheng, Juan Marcos Gonzalez, Matthew M Engelhard, Shelby Reed, Semra Ozdemir","doi":"10.1016/j.jval.2026.04.006","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>This study evaluated the viability of large-language-models (LLMs), specifically GPT-4, in predicting patient health preference-consistent choices using a discrete choice experiment (DCE) framework.</p><p><strong>Methods: </strong>Synthetic data were generated from real DCE responses by patients with a history of cancer. Analytical data included 50 synthetic patients, each answering 48 two-alternative treatment choice questions varying in expected survival, chance of long-term survival, health limitations, and out-of-pocket cost. GPT-4's predictive performance was assessed across four experiments. In Experiments 1 and 2, GPT-4 predicted 20 hold-out questions (i.e., new choice questions) leveraging 28 fixed (Experiment 1) or random selected (Experiment 2) sample questions. Experiment 3 varied the number of sample questions to examine prediction accuracy and prediction confidence. Experiment 4 evaluated how characteristics of the hold-out questions influenced prediction accuracy.</p><p><strong>Results: </strong>GPT-4 achieved an average prediction accuracy of 70.5% (95% confidence interval [CI]: 68.3%-72.7%) in Experiment 1 and 69.9% (95% CI: 66.9%-72.9%) in Experiment 2, with greater variability when sample questions were randomized. Experiment 3 revealed a learning curve, where accuracy improved from 53% with 5 sample questions to 64% with 10, after which performance plateaued. Experiment 4 showed higher prediction accuracy for questions with more salient attribute differences.</p><p><strong>Conclusions: </strong>GPT-4 demonstrated the ability to infer patient preferences from limited samples, achieving accuracy levels comparable to surrogate decision-makers. Its performance remained consistent across randomized input sequences and improved as the number of sample questions increased, eventually reaching a plateau where additional training yielded diminishing returns.</p>","PeriodicalId":23508,"journal":{"name":"Value in Health","volume":" ","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Value in Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jval.2026.04.006","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: This study evaluated the viability of large-language-models (LLMs), specifically GPT-4, in predicting patient health preference-consistent choices using a discrete choice experiment (DCE) framework.

Methods: Synthetic data were generated from real DCE responses by patients with a history of cancer. Analytical data included 50 synthetic patients, each answering 48 two-alternative treatment choice questions varying in expected survival, chance of long-term survival, health limitations, and out-of-pocket cost. GPT-4's predictive performance was assessed across four experiments. In Experiments 1 and 2, GPT-4 predicted 20 hold-out questions (i.e., new choice questions) leveraging 28 fixed (Experiment 1) or random selected (Experiment 2) sample questions. Experiment 3 varied the number of sample questions to examine prediction accuracy and prediction confidence. Experiment 4 evaluated how characteristics of the hold-out questions influenced prediction accuracy.

Results: GPT-4 achieved an average prediction accuracy of 70.5% (95% confidence interval [CI]: 68.3%-72.7%) in Experiment 1 and 69.9% (95% CI: 66.9%-72.9%) in Experiment 2, with greater variability when sample questions were randomized. Experiment 3 revealed a learning curve, where accuracy improved from 53% with 5 sample questions to 64% with 10, after which performance plateaued. Experiment 4 showed higher prediction accuracy for questions with more salient attribute differences.

Conclusions: GPT-4 demonstrated the ability to infer patient preferences from limited samples, achieving accuracy levels comparable to surrogate decision-makers. Its performance remained consistent across randomized input sequences and improved as the number of sample questions increased, eventually reaching a plateau where additional training yielded diminishing returns.

法学硕士能预测患者的治疗选择吗?离散选择实验框架。
目的:本研究评估了使用离散选择实验(DCE)框架预测患者健康偏好一致选择的大语言模型(LLMs),特别是GPT-4的可行性。方法:根据有癌症病史的患者的真实DCE反应生成合成数据。分析数据包括50名合成患者,每位患者回答48个两种治疗选择问题,这些问题包括预期生存期、长期生存机会、健康限制和自付费用。GPT-4的预测性能通过四个实验进行评估。在实验1和2中,GPT-4利用28个固定(实验1)或随机选择(实验2)样本问题预测了20个保留问题(即新选择问题)。实验3通过改变样本问题的数量来检验预测的准确性和预测的置信度。实验4评估了保留问题的特征对预测准确性的影响。结果:GPT-4在实验1中的平均预测准确率为70.5%(95%置信区间[CI]: 68.3% ~ 72.7%),在实验2中的平均预测准确率为69.9%(95%置信区间[CI]: 66.9% ~ 72.9%),在样本问题随机化时差异更大。实验3显示了一个学习曲线,准确率从5个样本问题的53%提高到10个样本问题的64%,之后表现趋于平稳。实验4显示属性差异越显著的问题预测准确率越高。结论:GPT-4证明了从有限的样本推断患者偏好的能力,达到了与替代决策者相当的准确性水平。它的性能在随机输入序列中保持一致,并随着样本问题数量的增加而提高,最终达到一个平台,在这个平台上,额外的训练产生的回报递减。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Value in Health
Value in Health 医学-卫生保健
CiteScore
6.90
自引率
6.70%
发文量
3064
审稿时长
3-8 weeks
期刊介绍: Value in Health contains original research articles for pharmacoeconomics, health economics, and outcomes research (clinical, economic, and patient-reported outcomes/preference-based research), as well as conceptual and health policy articles that provide valuable information for health care decision-makers as well as the research community. As the official journal of ISPOR, Value in Health provides a forum for researchers, as well as health care decision-makers to translate outcomes research into health care decisions.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书