基于两两比较序列贝叶斯优化的偏好学习

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Pub Date : 2025-08-28 DOI:10.1016/j.artint.2025.104400

Tanya Ignatenko , Kirill Kondrashov , Marco Cox , Bert de Vries

{"title":"基于两两比较序列贝叶斯优化的偏好学习","authors":"Tanya Ignatenko , Kirill Kondrashov , Marco Cox , Bert de Vries","doi":"10.1016/j.artint.2025.104400","DOIUrl":null,"url":null,"abstract":"<div><div>User preference learning is generally a hard problem. Individual preferences are typically unknown even to users themselves, while the space of choices is infinite. Here we study user preference learning from information-theoretic perspective. We model preference learning as a system with two interacting sub-systems, one representing a user with his/her preferences and another one representing an agent that has to learn these preferences. The user with his/her behavior is modeled by a parametric preference function. To efficiently learn the preferences and reduce search space quickly, we propose the agent that interacts with the user to collect the most informative data for learning. The agent presents two proposals to the user for evaluation, and the user rates them based on his/her preference function. We show that the optimum agent strategy for data collection and preference learning is a result of maximin optimization of the normalized weighted Kullback-Leibler (KL) divergence between true and agent-assigned predictive user response distributions. The resulting value of the KL-divergence, which we also call of a remaining system uncertainty (RSU), provides an efficient performance metric in the absence of the ground truth. This metric characterizes how well the agent can predict user and, thus, the quality of the underlying learned user (preference) model. Our proposed agent comprises sequential mechanisms for user model inference and proposal generation. To infer the user model (preference function), Bayesian approximate inference is used in the agent. The data collection strategy is to generate proposals, responses to which help resolving uncertainty associated with prediction of the user responses the most. The efficiency of our approach is validated by numerical simulations. Also a real-life example of preference learning application is provided.</div></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"348 ","pages":"Article 104400"},"PeriodicalIF":4.6000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On preference learning based on sequential Bayesian optimization with pairwise comparison\",\"authors\":\"Tanya Ignatenko , Kirill Kondrashov , Marco Cox , Bert de Vries\",\"doi\":\"10.1016/j.artint.2025.104400\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>User preference learning is generally a hard problem. Individual preferences are typically unknown even to users themselves, while the space of choices is infinite. Here we study user preference learning from information-theoretic perspective. We model preference learning as a system with two interacting sub-systems, one representing a user with his/her preferences and another one representing an agent that has to learn these preferences. The user with his/her behavior is modeled by a parametric preference function. To efficiently learn the preferences and reduce search space quickly, we propose the agent that interacts with the user to collect the most informative data for learning. The agent presents two proposals to the user for evaluation, and the user rates them based on his/her preference function. We show that the optimum agent strategy for data collection and preference learning is a result of maximin optimization of the normalized weighted Kullback-Leibler (KL) divergence between true and agent-assigned predictive user response distributions. The resulting value of the KL-divergence, which we also call of a remaining system uncertainty (RSU), provides an efficient performance metric in the absence of the ground truth. This metric characterizes how well the agent can predict user and, thus, the quality of the underlying learned user (preference) model. Our proposed agent comprises sequential mechanisms for user model inference and proposal generation. To infer the user model (preference function), Bayesian approximate inference is used in the agent. The data collection strategy is to generate proposals, responses to which help resolving uncertainty associated with prediction of the user responses the most. The efficiency of our approach is validated by numerical simulations. Also a real-life example of preference learning application is provided.</div></div>\",\"PeriodicalId\":8434,\"journal\":{\"name\":\"Artificial Intelligence\",\"volume\":\"348 \",\"pages\":\"Article 104400\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0004370225001195\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370225001195","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

用户偏好学习通常是一个难题。个人偏好通常连用户自己都不知道，而选择的空间是无限的。本文从信息论的角度研究用户偏好学习。我们将偏好学习建模为具有两个交互子系统的系统，一个代表具有其偏好的用户，另一个代表必须学习这些偏好的代理。用户和他/她的行为通过参数偏好函数建模。为了有效地学习用户偏好并快速减少搜索空间，我们提出了与用户交互的智能体来收集最具信息量的数据进行学习。智能体向用户提出两个建议供用户评价，用户根据自己的偏好函数对其进行评分。我们表明，用于数据收集和偏好学习的最优代理策略是真实和代理分配的预测用户响应分布之间的标准化加权Kullback-Leibler （KL）散度的最大优化结果。kl -散度的结果值，我们也称之为剩余系统不确定性（RSU），在没有基础真值的情况下提供了一个有效的性能度量。这个指标描述了代理预测用户的能力，从而描述了底层学习到的用户（偏好）模型的质量。我们提出的智能体包括用户模型推理和建议生成的顺序机制。为了推断用户模型（偏好函数），在代理中使用贝叶斯近似推理。数据收集策略是生成建议和响应，这些建议和响应最有助于解决与预测用户响应相关的不确定性。数值仿真验证了该方法的有效性。此外，还提供了一个实际应用的偏好学习实例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On preference learning based on sequential Bayesian optimization with pairwise comparison

User preference learning is generally a hard problem. Individual preferences are typically unknown even to users themselves, while the space of choices is infinite. Here we study user preference learning from information-theoretic perspective. We model preference learning as a system with two interacting sub-systems, one representing a user with his/her preferences and another one representing an agent that has to learn these preferences. The user with his/her behavior is modeled by a parametric preference function. To efficiently learn the preferences and reduce search space quickly, we propose the agent that interacts with the user to collect the most informative data for learning. The agent presents two proposals to the user for evaluation, and the user rates them based on his/her preference function. We show that the optimum agent strategy for data collection and preference learning is a result of maximin optimization of the normalized weighted Kullback-Leibler (KL) divergence between true and agent-assigned predictive user response distributions. The resulting value of the KL-divergence, which we also call of a remaining system uncertainty (RSU), provides an efficient performance metric in the absence of the ground truth. This metric characterizes how well the agent can predict user and, thus, the quality of the underlying learned user (preference) model. Our proposed agent comprises sequential mechanisms for user model inference and proposal generation. To infer the user model (preference function), Bayesian approximate inference is used in the agent. The data collection strategy is to generate proposals, responses to which help resolving uncertainty associated with prediction of the user responses the most. The efficiency of our approach is validated by numerical simulations. Also a real-life example of preference learning application is provided.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

11.20

自引率

1.40%

发文量

118

审稿时长

8 months

期刊介绍： The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.