Toward Bias-Agnostic Recommender Systems: A Universal Generative Framework

IF 9.1 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Information Systems Pub Date : 2024-04-02 DOI:10.1145/3655617

Zhidan Wang, Lixin Zou, Chenliang Li, Shuaiqiang Wang, Xu Chen, Dawei Yin, Weidong Liu

{"title":"Toward Bias-Agnostic Recommender Systems: A Universal Generative Framework","authors":"Zhidan Wang, Lixin Zou, Chenliang Li, Shuaiqiang Wang, Xu Chen, Dawei Yin, Weidong Liu","doi":"10.1145/3655617","DOIUrl":null,"url":null,"abstract":"User behavior data, such as ratings and clicks, has been widely used to build personalizing models for recommender systems. However, many unflattering factors (e.g., popularity, ranking position, users’ selection) significantly affect the performance of the learned recommendation model. Most existing work on unbiased recommendation addressed these biases from sample granularity (e.g., sample reweighting, data augmentation) or from the perspective of representation learning (e.g., bias-modeling). However, these methods are usually designed for a specific bias, lacking the universal capability to handle complex situations where multiple biases co-exist. Besides, rare work frees itself from laborious and sophisticated debiasing configurations (e.g., propensity scores, imputed values, or user behavior-generating process). Towards this research gap, in this paper, we propose a universal Generative framework for Bias Disentanglement termed as GBD, constantly generating calibration perturbations for the intermediate representations during training to keep them from being affected by the bias. Specifically, a bias-identifier that tries to retrieve the bias-related information from the representations is first introduced. Subsequently, the calibration perturbations are generated to significantly deteriorate the bias-identifier’s performance, making the bias gradually disentangled from the calibrated representations. Therefore, without relying on notorious debiasing configurations, a bias-agnostic model is obtained under the guidance of the bias identifier. We further present its universality by subsuming the representative biases and their mixture under the proposed framework. Finally, extensive experiments on the real-world, synthetic, and semi-synthetic datasets have demonstrated the superiority of the proposed approach against a wide range of recommendation debiasing methods. The code is available at https://github.com/Zhidan-Wang/GBD.","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":"46 1","pages":""},"PeriodicalIF":9.1000,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3655617","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

User behavior data, such as ratings and clicks, has been widely used to build personalizing models for recommender systems. However, many unflattering factors (e.g., popularity, ranking position, users’ selection) significantly affect the performance of the learned recommendation model. Most existing work on unbiased recommendation addressed these biases from sample granularity (e.g., sample reweighting, data augmentation) or from the perspective of representation learning (e.g., bias-modeling). However, these methods are usually designed for a specific bias, lacking the universal capability to handle complex situations where multiple biases co-exist. Besides, rare work frees itself from laborious and sophisticated debiasing configurations (e.g., propensity scores, imputed values, or user behavior-generating process).

Towards this research gap, in this paper, we propose a universal Generative framework for Bias Disentanglement termed as GBD, constantly generating calibration perturbations for the intermediate representations during training to keep them from being affected by the bias. Specifically, a bias-identifier that tries to retrieve the bias-related information from the representations is first introduced. Subsequently, the calibration perturbations are generated to significantly deteriorate the bias-identifier’s performance, making the bias gradually disentangled from the calibrated representations. Therefore, without relying on notorious debiasing configurations, a bias-agnostic model is obtained under the guidance of the bias identifier. We further present its universality by subsuming the representative biases and their mixture under the proposed framework. Finally, extensive experiments on the real-world, synthetic, and semi-synthetic datasets have demonstrated the superiority of the proposed approach against a wide range of recommendation debiasing methods. The code is available at https://github.com/Zhidan-Wang/GBD.

查看原文本刊更多论文

实现无偏见推荐系统：通用生成框架

评分和点击等用户行为数据已被广泛用于为推荐系统建立个性化模型。然而，许多不公正因素（如人气、排名位置、用户选择）会严重影响所学推荐模型的性能。大多数现有的无偏推荐工作都是从样本粒度（如样本重新加权、数据增强）或表征学习（如偏差建模）的角度来解决这些偏差的。然而，这些方法通常是针对特定偏差设计的，缺乏处理多种偏差并存的复杂情况的通用能力。此外，很少有工作能从费力而复杂的去偏差配置（如倾向分数、估算值或用户行为生成过程）中解脱出来。针对这一研究空白，我们在本文中提出了一种用于消除偏差的通用生成框架（称为 GBD），在训练过程中不断为中间表征生成校准扰动，以防止它们受到偏差的影响。具体来说，首先引入一个偏差识别器，试图从表征中检索与偏差相关的信息。随后，校准扰动的产生会显著降低偏差识别器的性能，使偏差逐渐与校准表征分离。因此，在偏差识别器的指导下，无需依赖声名狼藉的去除法配置，就能获得与偏差无关的模型。通过将代表性偏差及其混合物归入所提出的框架，我们进一步展示了其普遍性。最后，在真实世界、合成和半合成数据集上进行的大量实验证明，与各种推荐去偏差方法相比，所提出的方法更胜一筹。代码见 https://github.com/Zhidan-Wang/GBD。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

14.30%

发文量

165

审稿时长

>12 weeks

期刊介绍： The ACM Transactions on Information Systems (TOIS) publishes papers on information retrieval (such as search engines, recommender systems) that contain: new principled information retrieval models or algorithms with sound empirical validation; observational, experimental and/or theoretical studies yielding new insights into information retrieval or information seeking; accounts of applications of existing information retrieval techniques that shed light on the strengths and weaknesses of the techniques; formalization of new information retrieval or information seeking tasks and of methods for evaluating the performance on those tasks; development of content (text, image, speech, video, etc) analysis methods to support information retrieval and information seeking; development of computational models of user information preferences and interaction behaviors; creation and analysis of evaluation methodologies for information retrieval and information seeking; or surveys of existing work that propose a significant synthesis. The information retrieval scope of ACM Transactions on Information Systems (TOIS) appeals to industry practitioners for its wealth of creative ideas, and to academic researchers for its descriptions of their colleagues'' work.