DRM4Rec：一种用于推荐系统评价的双鲁棒匹配方法

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-04-11 DOI:10.1016/j.eswa.2025.127431

Zhen Li , Jibin Wang , Zhuo Chen , Kun Wu , Liang Liu , Meng Ai , Li Liu

{"title":"DRM4Rec：一种用于推荐系统评价的双鲁棒匹配方法","authors":"Zhen Li , Jibin Wang , Zhuo Chen , Kun Wu , Liang Liu , Meng Ai , Li Liu","doi":"10.1016/j.eswa.2025.127431","DOIUrl":null,"url":null,"abstract":"<div><div>Ranking is a core task in recommender systems, and various ranking metrics are designed to assess the quality of ordered item lists. A reliable evaluation method is crucial for identifying optimal recommendation algorithms, directly impacting downstream tasks such as click-through rate prediction and post-click conversion rate estimation. However, estimating these ranking metrics is a challenging task due to the existence of various biases in the historical data in recommender systems (RS), that is, the data used for evaluation may be significantly different from the target environment where the recommendation algorithm is planned to be deployed. Recent works propose using pseudo-labeling and reweighting for debiasing and thereby achieving unbiased evaluation. Despite being theoretically promising, the collected biased feedback makes the pseudo-labeling directly rely on extrapolation, and the propensity-based weighting method has a large variance in the presence of small propensities due to data sparsity commonly exists in real-world scenarios, both of them lack stability. In this paper, we propose a novel Doubly Robust Matching for Recommendation (DRM4Rec) method to achieve unbiased ranking metric evaluation. Compared to existing approaches, DRM4Rec reduces the unavoidable high variance due to the small propensities and also mitigates the direct harm to prediction performance from incorrect extrapolations. In addition, the proposed method has double robustness — it achieves unbiased ranking metric evaluation when either the imputed relevance or the learned propensities are accurate. We conduct extensive semi-synthetic and real-world experiments to evaluate three representative recommendation models, and the results show that DRM4Rec provides significant improvements for unbiased ranking metrics evaluations.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"280 ","pages":"Article 127431"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DRM4Rec: A Doubly Robust Matching Approach for Recommender System Evaluation\",\"authors\":\"Zhen Li , Jibin Wang , Zhuo Chen , Kun Wu , Liang Liu , Meng Ai , Li Liu\",\"doi\":\"10.1016/j.eswa.2025.127431\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Ranking is a core task in recommender systems, and various ranking metrics are designed to assess the quality of ordered item lists. A reliable evaluation method is crucial for identifying optimal recommendation algorithms, directly impacting downstream tasks such as click-through rate prediction and post-click conversion rate estimation. However, estimating these ranking metrics is a challenging task due to the existence of various biases in the historical data in recommender systems (RS), that is, the data used for evaluation may be significantly different from the target environment where the recommendation algorithm is planned to be deployed. Recent works propose using pseudo-labeling and reweighting for debiasing and thereby achieving unbiased evaluation. Despite being theoretically promising, the collected biased feedback makes the pseudo-labeling directly rely on extrapolation, and the propensity-based weighting method has a large variance in the presence of small propensities due to data sparsity commonly exists in real-world scenarios, both of them lack stability. In this paper, we propose a novel Doubly Robust Matching for Recommendation (DRM4Rec) method to achieve unbiased ranking metric evaluation. Compared to existing approaches, DRM4Rec reduces the unavoidable high variance due to the small propensities and also mitigates the direct harm to prediction performance from incorrect extrapolations. In addition, the proposed method has double robustness — it achieves unbiased ranking metric evaluation when either the imputed relevance or the learned propensities are accurate. We conduct extensive semi-synthetic and real-world experiments to evaluate three representative recommendation models, and the results show that DRM4Rec provides significant improvements for unbiased ranking metrics evaluations.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"280 \",\"pages\":\"Article 127431\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S095741742501053X\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095741742501053X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

排序是推荐系统的核心任务，设计了各种排序指标来评估有序条目列表的质量。一个可靠的评估方法对于确定最佳推荐算法至关重要，它直接影响下游任务，如点击率预测和点击后转化率估计。然而，由于推荐系统（RS）中的历史数据存在各种偏差，因此估计这些排名指标是一项具有挑战性的任务，即用于评估的数据可能与计划部署推荐算法的目标环境有很大不同。最近的工作建议使用伪标签和重加权来消除偏见，从而实现无偏评价。尽管理论上很有希望，但收集到的偏差反馈使得伪标注直接依赖于外推，而基于倾向的加权方法在存在小倾向的情况下方差较大，因为现实场景中普遍存在数据稀疏性，两者都缺乏稳定性。在本文中，我们提出了一种新的双鲁棒推荐匹配（DRM4Rec）方法来实现无偏排序度量评价。与现有方法相比，DRM4Rec减少了由于小倾向而不可避免的高方差，也减轻了错误外推对预测性能的直接危害。此外，所提出的方法具有双鲁棒性，即当输入的相关性或学习的倾向均准确时，都能实现无偏排序度量评价。我们进行了大量的半合成和现实世界的实验来评估三个代表性的推荐模型，结果表明DRM4Rec在无偏排名指标评估方面提供了显著的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DRM4Rec: A Doubly Robust Matching Approach for Recommender System Evaluation

Ranking is a core task in recommender systems, and various ranking metrics are designed to assess the quality of ordered item lists. A reliable evaluation method is crucial for identifying optimal recommendation algorithms, directly impacting downstream tasks such as click-through rate prediction and post-click conversion rate estimation. However, estimating these ranking metrics is a challenging task due to the existence of various biases in the historical data in recommender systems (RS), that is, the data used for evaluation may be significantly different from the target environment where the recommendation algorithm is planned to be deployed. Recent works propose using pseudo-labeling and reweighting for debiasing and thereby achieving unbiased evaluation. Despite being theoretically promising, the collected biased feedback makes the pseudo-labeling directly rely on extrapolation, and the propensity-based weighting method has a large variance in the presence of small propensities due to data sparsity commonly exists in real-world scenarios, both of them lack stability. In this paper, we propose a novel Doubly Robust Matching for Recommendation (DRM4Rec) method to achieve unbiased ranking metric evaluation. Compared to existing approaches, DRM4Rec reduces the unavoidable high variance due to the small propensities and also mitigates the direct harm to prediction performance from incorrect extrapolations. In addition, the proposed method has double robustness — it achieves unbiased ranking metric evaluation when either the imputed relevance or the learned propensities are accurate. We conduct extensive semi-synthetic and real-world experiments to evaluate three representative recommendation models, and the results show that DRM4Rec provides significant improvements for unbiased ranking metrics evaluations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.