用于大规模推荐系统多任务融合的增强状态强化学习算法

arXiv - CS - Information Retrieval Pub Date : 2024-09-18 DOI:arxiv-2409.11678

Peng Liu, Jiawei Zhu, Cong Xu, Ming Zhao, Bin Wang

{"title":"用于大规模推荐系统多任务融合的增强状态强化学习算法","authors":"Peng Liu, Jiawei Zhu, Cong Xu, Ming Zhao, Bin Wang","doi":"arxiv-2409.11678","DOIUrl":null,"url":null,"abstract":"As the last key stage of Recommender Systems (RSs), Multi-Task Fusion (MTF)\nis in charge of combining multiple scores predicted by Multi-Task Learning\n(MTL) into a final score to maximize user satisfaction, which decides the\nultimate recommendation results. In recent years, to maximize long-term user\nsatisfaction within a recommendation session, Reinforcement Learning (RL) is\nwidely used for MTF in large-scale RSs. However, limited by their modeling\npattern, all the current RL-MTF methods can only utilize user features as the\nstate to generate actions for each user, but unable to make use of item\nfeatures and other valuable features, which leads to suboptimal results.\nAddressing this problem is a challenge that requires breaking through the\ncurrent modeling pattern of RL-MTF. To solve this problem, we propose a novel\nmethod called Enhanced-State RL for MTF in RSs. Unlike the existing methods\nmentioned above, our method first defines user features, item features, and\nother valuable features collectively as the enhanced state; then proposes a\nnovel actor and critic learning process to utilize the enhanced state to make\nmuch better action for each user-item pair. To the best of our knowledge, this\nnovel modeling pattern is being proposed for the first time in the field of\nRL-MTF. We conduct extensive offline and online experiments in a large-scale\nRS. The results demonstrate that our model outperforms other models\nsignificantly. Enhanced-State RL has been fully deployed in our RS more than\nhalf a year, improving +3.84% user valid consumption and +0.58% user duration\ntime compared to baseline.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Enhanced-State Reinforcement Learning Algorithm for Multi-Task Fusion in Large-Scale Recommender Systems\",\"authors\":\"Peng Liu, Jiawei Zhu, Cong Xu, Ming Zhao, Bin Wang\",\"doi\":\"arxiv-2409.11678\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the last key stage of Recommender Systems (RSs), Multi-Task Fusion (MTF)\\nis in charge of combining multiple scores predicted by Multi-Task Learning\\n(MTL) into a final score to maximize user satisfaction, which decides the\\nultimate recommendation results. In recent years, to maximize long-term user\\nsatisfaction within a recommendation session, Reinforcement Learning (RL) is\\nwidely used for MTF in large-scale RSs. However, limited by their modeling\\npattern, all the current RL-MTF methods can only utilize user features as the\\nstate to generate actions for each user, but unable to make use of item\\nfeatures and other valuable features, which leads to suboptimal results.\\nAddressing this problem is a challenge that requires breaking through the\\ncurrent modeling pattern of RL-MTF. To solve this problem, we propose a novel\\nmethod called Enhanced-State RL for MTF in RSs. Unlike the existing methods\\nmentioned above, our method first defines user features, item features, and\\nother valuable features collectively as the enhanced state; then proposes a\\nnovel actor and critic learning process to utilize the enhanced state to make\\nmuch better action for each user-item pair. To the best of our knowledge, this\\nnovel modeling pattern is being proposed for the first time in the field of\\nRL-MTF. We conduct extensive offline and online experiments in a large-scale\\nRS. The results demonstrate that our model outperforms other models\\nsignificantly. Enhanced-State RL has been fully deployed in our RS more than\\nhalf a year, improving +3.84% user valid consumption and +0.58% user duration\\ntime compared to baseline.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11678\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

作为推荐系统（RS）的最后一个关键阶段，多任务融合（MTF）负责将多任务学习（Multi-Task Learning，MTL）预测的多个得分合并成一个最终得分，以最大限度地提高用户满意度，从而决定最终的推荐结果。近年来，为了最大限度地提高用户在一次推荐会话中的长期满意度，强化学习（RL）被广泛应用于大规模 RS 的 MTF。然而，受限于其建模模式，目前所有的 RL-MTF 方法都只能利用用户特征作为状态来为每个用户生成动作，而无法利用项目特征和其他有价值的特征，从而导致次优结果。为了解决这个问题，我们提出了一种新方法，称为 RS 中 MTF 的增强状态 RL。与上述现有方法不同的是，我们的方法首先将用户特征、物品特征和其他有价值的特征统称为增强状态，然后提出了一个新的行为者和批评者学习过程，以利用增强状态为每个用户-物品配对做出更好的行动。据我们所知，这种新颖的建模模式是在 RL-MTF 领域首次提出的。我们在大规模系统中进行了广泛的离线和在线实验。结果表明，我们的模型明显优于其他模型。增强状态 RL 已在我们的 RS 中全面部署了半年多，与基线相比，用户有效消耗量提高了 +3.84%，用户持续时间提高了 +0.58%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Enhanced-State Reinforcement Learning Algorithm for Multi-Task Fusion in Large-Scale Recommender Systems

As the last key stage of Recommender Systems (RSs), Multi-Task Fusion (MTF) is in charge of combining multiple scores predicted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction, which decides the ultimate recommendation results. In recent years, to maximize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is widely used for MTF in large-scale RSs. However, limited by their modeling pattern, all the current RL-MTF methods can only utilize user features as the state to generate actions for each user, but unable to make use of item features and other valuable features, which leads to suboptimal results. Addressing this problem is a challenge that requires breaking through the current modeling pattern of RL-MTF. To solve this problem, we propose a novel method called Enhanced-State RL for MTF in RSs. Unlike the existing methods mentioned above, our method first defines user features, item features, and other valuable features collectively as the enhanced state; then proposes a novel actor and critic learning process to utilize the enhanced state to make much better action for each user-item pair. To the best of our knowledge, this novel modeling pattern is being proposed for the first time in the field of RL-MTF. We conduct extensive offline and online experiments in a large-scale RS. The results demonstrate that our model outperforms other models significantly. Enhanced-State RL has been fully deployed in our RS more than half a year, improving +3.84% user valid consumption and +0.58% user duration time compared to baseline.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Information Retrieval

自引率

0.00%

发文量