An Enhanced-State Reinforcement Learning Algorithm for Multi-Task Fusion in Large-Scale Recommender Systems

Peng Liu, Jiawei Zhu, Cong Xu, Ming Zhao, Bin Wang
{"title":"An Enhanced-State Reinforcement Learning Algorithm for Multi-Task Fusion in Large-Scale Recommender Systems","authors":"Peng Liu, Jiawei Zhu, Cong Xu, Ming Zhao, Bin Wang","doi":"arxiv-2409.11678","DOIUrl":null,"url":null,"abstract":"As the last key stage of Recommender Systems (RSs), Multi-Task Fusion (MTF)\nis in charge of combining multiple scores predicted by Multi-Task Learning\n(MTL) into a final score to maximize user satisfaction, which decides the\nultimate recommendation results. In recent years, to maximize long-term user\nsatisfaction within a recommendation session, Reinforcement Learning (RL) is\nwidely used for MTF in large-scale RSs. However, limited by their modeling\npattern, all the current RL-MTF methods can only utilize user features as the\nstate to generate actions for each user, but unable to make use of item\nfeatures and other valuable features, which leads to suboptimal results.\nAddressing this problem is a challenge that requires breaking through the\ncurrent modeling pattern of RL-MTF. To solve this problem, we propose a novel\nmethod called Enhanced-State RL for MTF in RSs. Unlike the existing methods\nmentioned above, our method first defines user features, item features, and\nother valuable features collectively as the enhanced state; then proposes a\nnovel actor and critic learning process to utilize the enhanced state to make\nmuch better action for each user-item pair. To the best of our knowledge, this\nnovel modeling pattern is being proposed for the first time in the field of\nRL-MTF. We conduct extensive offline and online experiments in a large-scale\nRS. The results demonstrate that our model outperforms other models\nsignificantly. Enhanced-State RL has been fully deployed in our RS more than\nhalf a year, improving +3.84% user valid consumption and +0.58% user duration\ntime compared to baseline.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As the last key stage of Recommender Systems (RSs), Multi-Task Fusion (MTF) is in charge of combining multiple scores predicted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction, which decides the ultimate recommendation results. In recent years, to maximize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is widely used for MTF in large-scale RSs. However, limited by their modeling pattern, all the current RL-MTF methods can only utilize user features as the state to generate actions for each user, but unable to make use of item features and other valuable features, which leads to suboptimal results. Addressing this problem is a challenge that requires breaking through the current modeling pattern of RL-MTF. To solve this problem, we propose a novel method called Enhanced-State RL for MTF in RSs. Unlike the existing methods mentioned above, our method first defines user features, item features, and other valuable features collectively as the enhanced state; then proposes a novel actor and critic learning process to utilize the enhanced state to make much better action for each user-item pair. To the best of our knowledge, this novel modeling pattern is being proposed for the first time in the field of RL-MTF. We conduct extensive offline and online experiments in a large-scale RS. The results demonstrate that our model outperforms other models significantly. Enhanced-State RL has been fully deployed in our RS more than half a year, improving +3.84% user valid consumption and +0.58% user duration time compared to baseline.
用于大规模推荐系统多任务融合的增强状态强化学习算法
作为推荐系统(RS)的最后一个关键阶段,多任务融合(MTF)负责将多任务学习(Multi-Task Learning,MTL)预测的多个得分合并成一个最终得分,以最大限度地提高用户满意度,从而决定最终的推荐结果。近年来,为了最大限度地提高用户在一次推荐会话中的长期满意度,强化学习(RL)被广泛应用于大规模 RS 的 MTF。然而,受限于其建模模式,目前所有的 RL-MTF 方法都只能利用用户特征作为状态来为每个用户生成动作,而无法利用项目特征和其他有价值的特征,从而导致次优结果。为了解决这个问题,我们提出了一种新方法,称为 RS 中 MTF 的增强状态 RL。与上述现有方法不同的是,我们的方法首先将用户特征、物品特征和其他有价值的特征统称为增强状态,然后提出了一个新的行为者和批评者学习过程,以利用增强状态为每个用户-物品配对做出更好的行动。据我们所知,这种新颖的建模模式是在 RL-MTF 领域首次提出的。我们在大规模系统中进行了广泛的离线和在线实验。结果表明,我们的模型明显优于其他模型。增强状态 RL 已在我们的 RS 中全面部署了半年多,与基线相比,用户有效消耗量提高了 +3.84%,用户持续时间提高了 +0.58%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信