Debias可能不可靠：在评估去偏差建议时减少偏差问题

arXiv - CS - Information Retrieval Pub Date : 2024-09-07 DOI:arxiv-2409.04810

Chengbing Wang, Wentao Shi, Jizhi Zhang, Wenjie Wang, Hang Pan, Fuli Feng

{"title":"Debias可能不可靠：在评估去偏差建议时减少偏差问题","authors":"Chengbing Wang, Wentao Shi, Jizhi Zhang, Wenjie Wang, Hang Pan, Fuli Feng","doi":"arxiv-2409.04810","DOIUrl":null,"url":null,"abstract":"Recent work has improved recommendation models remarkably by equipping them\nwith debiasing methods. Due to the unavailability of fully-exposed datasets,\nmost existing approaches resort to randomly-exposed datasets as a proxy for\nevaluating debiased models, employing traditional evaluation scheme to\nrepresent the recommendation performance. However, in this study, we reveal\nthat traditional evaluation scheme is not suitable for randomly-exposed\ndatasets, leading to inconsistency between the Recall performance obtained\nusing randomly-exposed datasets and that obtained using fully-exposed datasets.\nSuch inconsistency indicates the potential unreliability of experiment\nconclusions on previous debiasing techniques and calls for unbiased Recall\nevaluation using randomly-exposed datasets. To bridge the gap, we propose the\nUnbiased Recall Evaluation (URE) scheme, which adjusts the utilization of\nrandomly-exposed datasets to unbiasedly estimate the true Recall performance on\nfully-exposed datasets. We provide theoretical evidence to demonstrate the\nrationality of URE and perform extensive experiments on real-world datasets to\nvalidate its soundness.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Debias Can be Unreliable: Mitigating Bias Issue in Evaluating Debiasing Recommendation\",\"authors\":\"Chengbing Wang, Wentao Shi, Jizhi Zhang, Wenjie Wang, Hang Pan, Fuli Feng\",\"doi\":\"arxiv-2409.04810\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent work has improved recommendation models remarkably by equipping them\\nwith debiasing methods. Due to the unavailability of fully-exposed datasets,\\nmost existing approaches resort to randomly-exposed datasets as a proxy for\\nevaluating debiased models, employing traditional evaluation scheme to\\nrepresent the recommendation performance. However, in this study, we reveal\\nthat traditional evaluation scheme is not suitable for randomly-exposed\\ndatasets, leading to inconsistency between the Recall performance obtained\\nusing randomly-exposed datasets and that obtained using fully-exposed datasets.\\nSuch inconsistency indicates the potential unreliability of experiment\\nconclusions on previous debiasing techniques and calls for unbiased Recall\\nevaluation using randomly-exposed datasets. To bridge the gap, we propose the\\nUnbiased Recall Evaluation (URE) scheme, which adjusts the utilization of\\nrandomly-exposed datasets to unbiasedly estimate the true Recall performance on\\nfully-exposed datasets. We provide theoretical evidence to demonstrate the\\nrationality of URE and perform extensive experiments on real-world datasets to\\nvalidate its soundness.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.04810\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04810","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

最近的研究通过为推荐模型配备去毛刺方法，显著改进了推荐模型。由于无法获得完全暴露的数据集，现有的大多数方法都采用随机暴露的数据集作为评估去毛刺模型的代理，并采用传统的评估方案来表示推荐性能。然而，在本研究中，我们发现传统的评估方案并不适合随机暴露的数据集，导致使用随机暴露数据集获得的召回性能与使用完全暴露数据集获得的召回性能不一致。这种不一致表明以往去毛刺技术的实验结论可能不可靠，因此需要使用随机暴露数据集进行无偏见的召回评估。为了弥补这一差距，我们提出了无偏召回率评估（URE）方案，通过调整随机暴露数据集的利用率来无偏估计完全暴露数据集上的真实召回率性能。我们提供了理论证据来证明URE的合理性，并在实际数据集上进行了大量实验来验证其合理性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Debias Can be Unreliable: Mitigating Bias Issue in Evaluating Debiasing Recommendation

Recent work has improved recommendation models remarkably by equipping them with debiasing methods. Due to the unavailability of fully-exposed datasets, most existing approaches resort to randomly-exposed datasets as a proxy for evaluating debiased models, employing traditional evaluation scheme to represent the recommendation performance. However, in this study, we reveal that traditional evaluation scheme is not suitable for randomly-exposed datasets, leading to inconsistency between the Recall performance obtained using randomly-exposed datasets and that obtained using fully-exposed datasets. Such inconsistency indicates the potential unreliability of experiment conclusions on previous debiasing techniques and calls for unbiased Recall evaluation using randomly-exposed datasets. To bridge the gap, we propose the Unbiased Recall Evaluation (URE) scheme, which adjusts the utilization of randomly-exposed datasets to unbiasedly estimate the true Recall performance on fully-exposed datasets. We provide theoretical evidence to demonstrate the rationality of URE and perform extensive experiments on real-world datasets to validate its soundness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Information Retrieval

自引率

0.00%

发文量