Zhen Li , Jibin Wang , Zhuo Chen , Kun Wu , Liang Liu , Meng Ai , Li Liu
{"title":"DRM4Rec:一种用于推荐系统评价的双鲁棒匹配方法","authors":"Zhen Li , Jibin Wang , Zhuo Chen , Kun Wu , Liang Liu , Meng Ai , Li Liu","doi":"10.1016/j.eswa.2025.127431","DOIUrl":null,"url":null,"abstract":"<div><div>Ranking is a core task in recommender systems, and various ranking metrics are designed to assess the quality of ordered item lists. A reliable evaluation method is crucial for identifying optimal recommendation algorithms, directly impacting downstream tasks such as click-through rate prediction and post-click conversion rate estimation. However, estimating these ranking metrics is a challenging task due to the existence of various biases in the historical data in recommender systems (RS), that is, the data used for evaluation may be significantly different from the target environment where the recommendation algorithm is planned to be deployed. Recent works propose using pseudo-labeling and reweighting for debiasing and thereby achieving unbiased evaluation. Despite being theoretically promising, the collected biased feedback makes the pseudo-labeling directly rely on extrapolation, and the propensity-based weighting method has a large variance in the presence of small propensities due to data sparsity commonly exists in real-world scenarios, both of them lack stability. In this paper, we propose a novel Doubly Robust Matching for Recommendation (DRM4Rec) method to achieve unbiased ranking metric evaluation. Compared to existing approaches, DRM4Rec reduces the unavoidable high variance due to the small propensities and also mitigates the direct harm to prediction performance from incorrect extrapolations. In addition, the proposed method has double robustness — it achieves unbiased ranking metric evaluation when either the imputed relevance or the learned propensities are accurate. We conduct extensive semi-synthetic and real-world experiments to evaluate three representative recommendation models, and the results show that DRM4Rec provides significant improvements for unbiased ranking metrics evaluations.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"280 ","pages":"Article 127431"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DRM4Rec: A Doubly Robust Matching Approach for Recommender System Evaluation\",\"authors\":\"Zhen Li , Jibin Wang , Zhuo Chen , Kun Wu , Liang Liu , Meng Ai , Li Liu\",\"doi\":\"10.1016/j.eswa.2025.127431\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Ranking is a core task in recommender systems, and various ranking metrics are designed to assess the quality of ordered item lists. A reliable evaluation method is crucial for identifying optimal recommendation algorithms, directly impacting downstream tasks such as click-through rate prediction and post-click conversion rate estimation. However, estimating these ranking metrics is a challenging task due to the existence of various biases in the historical data in recommender systems (RS), that is, the data used for evaluation may be significantly different from the target environment where the recommendation algorithm is planned to be deployed. Recent works propose using pseudo-labeling and reweighting for debiasing and thereby achieving unbiased evaluation. Despite being theoretically promising, the collected biased feedback makes the pseudo-labeling directly rely on extrapolation, and the propensity-based weighting method has a large variance in the presence of small propensities due to data sparsity commonly exists in real-world scenarios, both of them lack stability. In this paper, we propose a novel Doubly Robust Matching for Recommendation (DRM4Rec) method to achieve unbiased ranking metric evaluation. Compared to existing approaches, DRM4Rec reduces the unavoidable high variance due to the small propensities and also mitigates the direct harm to prediction performance from incorrect extrapolations. In addition, the proposed method has double robustness — it achieves unbiased ranking metric evaluation when either the imputed relevance or the learned propensities are accurate. We conduct extensive semi-synthetic and real-world experiments to evaluate three representative recommendation models, and the results show that DRM4Rec provides significant improvements for unbiased ranking metrics evaluations.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"280 \",\"pages\":\"Article 127431\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S095741742501053X\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095741742501053X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
DRM4Rec: A Doubly Robust Matching Approach for Recommender System Evaluation
Ranking is a core task in recommender systems, and various ranking metrics are designed to assess the quality of ordered item lists. A reliable evaluation method is crucial for identifying optimal recommendation algorithms, directly impacting downstream tasks such as click-through rate prediction and post-click conversion rate estimation. However, estimating these ranking metrics is a challenging task due to the existence of various biases in the historical data in recommender systems (RS), that is, the data used for evaluation may be significantly different from the target environment where the recommendation algorithm is planned to be deployed. Recent works propose using pseudo-labeling and reweighting for debiasing and thereby achieving unbiased evaluation. Despite being theoretically promising, the collected biased feedback makes the pseudo-labeling directly rely on extrapolation, and the propensity-based weighting method has a large variance in the presence of small propensities due to data sparsity commonly exists in real-world scenarios, both of them lack stability. In this paper, we propose a novel Doubly Robust Matching for Recommendation (DRM4Rec) method to achieve unbiased ranking metric evaluation. Compared to existing approaches, DRM4Rec reduces the unavoidable high variance due to the small propensities and also mitigates the direct harm to prediction performance from incorrect extrapolations. In addition, the proposed method has double robustness — it achieves unbiased ranking metric evaluation when either the imputed relevance or the learned propensities are accurate. We conduct extensive semi-synthetic and real-world experiments to evaluate three representative recommendation models, and the results show that DRM4Rec provides significant improvements for unbiased ranking metrics evaluations.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.