Shen Li, Yuyang Zhang, Zhaolin Ren, Claire Liang, Na Li, Julie A. Shah
{"title":"Enhancing Preference-based Linear Bandits via Human Response Time","authors":"Shen Li, Yuyang Zhang, Zhaolin Ren, Claire Liang, Na Li, Julie A. Shah","doi":"arxiv-2409.05798","DOIUrl":null,"url":null,"abstract":"Binary human choice feedback is widely used in interactive preference\nlearning for its simplicity, but it provides limited information about\npreference strength. To overcome this limitation, we leverage human response\ntimes, which inversely correlate with preference strength, as complementary\ninformation. Our work integrates the EZ-diffusion model, which jointly models\nhuman choices and response times, into preference-based linear bandits. We\nintroduce a computationally efficient utility estimator that reformulates the\nutility estimation problem using both choices and response times as a linear\nregression problem. Theoretical and empirical comparisons with traditional\nchoice-only estimators reveal that for queries with strong preferences (\"easy\"\nqueries), choices alone provide limited information, while response times offer\nvaluable complementary information about preference strength. As a result,\nincorporating response times makes easy queries more useful. We demonstrate\nthis advantage in the fixed-budget best-arm identification problem, with\nsimulations based on three real-world datasets, consistently showing\naccelerated learning when response times are incorporated.","PeriodicalId":501293,"journal":{"name":"arXiv - ECON - Econometrics","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - Econometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05798","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Binary human choice feedback is widely used in interactive preference
learning for its simplicity, but it provides limited information about
preference strength. To overcome this limitation, we leverage human response
times, which inversely correlate with preference strength, as complementary
information. Our work integrates the EZ-diffusion model, which jointly models
human choices and response times, into preference-based linear bandits. We
introduce a computationally efficient utility estimator that reformulates the
utility estimation problem using both choices and response times as a linear
regression problem. Theoretical and empirical comparisons with traditional
choice-only estimators reveal that for queries with strong preferences ("easy"
queries), choices alone provide limited information, while response times offer
valuable complementary information about preference strength. As a result,
incorporating response times makes easy queries more useful. We demonstrate
this advantage in the fixed-budget best-arm identification problem, with
simulations based on three real-world datasets, consistently showing
accelerated learning when response times are incorporated.
二进制人类选择反馈因其简单性被广泛应用于交互式偏好学习中,但它提供的偏好强度信息有限。为了克服这一局限,我们利用与偏好强度成反比的人类反应时间作为补充信息。我们的工作将 EZ 扩散模型与基于偏好的线性匪帮模型相结合,EZ 扩散模型可以对人类的选择和响应时间进行联合建模。我们引入了一种计算效率高的效用估计器,它将使用选择和响应时间的效用估计问题重新表述为线性回归问题。通过与传统的仅有选择的估计器进行理论和实证比较,我们发现对于具有强烈偏好的查询("简单 "查询),仅有选择提供的信息是有限的,而响应时间则提供了关于偏好强度的宝贵补充信息。因此,加入响应时间会使简单查询更有用。我们在固定预算最佳臂识别问题中证明了这一优势,并基于三个真实世界数据集进行了模拟。