Enhancing Preference-based Linear Bandits via Human Response Time

arXiv - ECON - Econometrics Pub Date : 2024-09-09 DOI:arxiv-2409.05798

Shen Li, Yuyang Zhang, Zhaolin Ren, Claire Liang, Na Li, Julie A. Shah

{"title":"Enhancing Preference-based Linear Bandits via Human Response Time","authors":"Shen Li, Yuyang Zhang, Zhaolin Ren, Claire Liang, Na Li, Julie A. Shah","doi":"arxiv-2409.05798","DOIUrl":null,"url":null,"abstract":"Binary human choice feedback is widely used in interactive preference\nlearning for its simplicity, but it provides limited information about\npreference strength. To overcome this limitation, we leverage human response\ntimes, which inversely correlate with preference strength, as complementary\ninformation. Our work integrates the EZ-diffusion model, which jointly models\nhuman choices and response times, into preference-based linear bandits. We\nintroduce a computationally efficient utility estimator that reformulates the\nutility estimation problem using both choices and response times as a linear\nregression problem. Theoretical and empirical comparisons with traditional\nchoice-only estimators reveal that for queries with strong preferences (\"easy\"\nqueries), choices alone provide limited information, while response times offer\nvaluable complementary information about preference strength. As a result,\nincorporating response times makes easy queries more useful. We demonstrate\nthis advantage in the fixed-budget best-arm identification problem, with\nsimulations based on three real-world datasets, consistently showing\naccelerated learning when response times are incorporated.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Binary human choice feedback is widely used in interactive preference learning for its simplicity, but it provides limited information about preference strength. To overcome this limitation, we leverage human response times, which inversely correlate with preference strength, as complementary information. Our work integrates the EZ-diffusion model, which jointly models human choices and response times, into preference-based linear bandits. We introduce a computationally efficient utility estimator that reformulates the utility estimation problem using both choices and response times as a linear regression problem. Theoretical and empirical comparisons with traditional choice-only estimators reveal that for queries with strong preferences ("easy" queries), choices alone provide limited information, while response times offer valuable complementary information about preference strength. As a result, incorporating response times makes easy queries more useful. We demonstrate this advantage in the fixed-budget best-arm identification problem, with simulations based on three real-world datasets, consistently showing accelerated learning when response times are incorporated.

查看原文本刊更多论文

通过人类响应时间增强基于偏好的线性匪帮

二进制人类选择反馈因其简单性被广泛应用于交互式偏好学习中，但它提供的偏好强度信息有限。为了克服这一局限，我们利用与偏好强度成反比的人类反应时间作为补充信息。我们的工作将 EZ 扩散模型与基于偏好的线性匪帮模型相结合，EZ 扩散模型可以对人类的选择和响应时间进行联合建模。我们引入了一种计算效率高的效用估计器，它将使用选择和响应时间的效用估计问题重新表述为线性回归问题。通过与传统的仅有选择的估计器进行理论和实证比较，我们发现对于具有强烈偏好的查询（"简单 "查询），仅有选择提供的信息是有限的，而响应时间则提供了关于偏好强度的宝贵补充信息。因此，加入响应时间会使简单查询更有用。我们在固定预算最佳臂识别问题中证明了这一优势，并基于三个真实世界数据集进行了模拟。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - ECON - Econometrics

自引率

0.00%

发文量