{"title":"Debias可能不可靠:在评估去偏差建议时减少偏差问题","authors":"Chengbing Wang, Wentao Shi, Jizhi Zhang, Wenjie Wang, Hang Pan, Fuli Feng","doi":"arxiv-2409.04810","DOIUrl":null,"url":null,"abstract":"Recent work has improved recommendation models remarkably by equipping them\nwith debiasing methods. Due to the unavailability of fully-exposed datasets,\nmost existing approaches resort to randomly-exposed datasets as a proxy for\nevaluating debiased models, employing traditional evaluation scheme to\nrepresent the recommendation performance. However, in this study, we reveal\nthat traditional evaluation scheme is not suitable for randomly-exposed\ndatasets, leading to inconsistency between the Recall performance obtained\nusing randomly-exposed datasets and that obtained using fully-exposed datasets.\nSuch inconsistency indicates the potential unreliability of experiment\nconclusions on previous debiasing techniques and calls for unbiased Recall\nevaluation using randomly-exposed datasets. To bridge the gap, we propose the\nUnbiased Recall Evaluation (URE) scheme, which adjusts the utilization of\nrandomly-exposed datasets to unbiasedly estimate the true Recall performance on\nfully-exposed datasets. We provide theoretical evidence to demonstrate the\nrationality of URE and perform extensive experiments on real-world datasets to\nvalidate its soundness.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Debias Can be Unreliable: Mitigating Bias Issue in Evaluating Debiasing Recommendation\",\"authors\":\"Chengbing Wang, Wentao Shi, Jizhi Zhang, Wenjie Wang, Hang Pan, Fuli Feng\",\"doi\":\"arxiv-2409.04810\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent work has improved recommendation models remarkably by equipping them\\nwith debiasing methods. Due to the unavailability of fully-exposed datasets,\\nmost existing approaches resort to randomly-exposed datasets as a proxy for\\nevaluating debiased models, employing traditional evaluation scheme to\\nrepresent the recommendation performance. However, in this study, we reveal\\nthat traditional evaluation scheme is not suitable for randomly-exposed\\ndatasets, leading to inconsistency between the Recall performance obtained\\nusing randomly-exposed datasets and that obtained using fully-exposed datasets.\\nSuch inconsistency indicates the potential unreliability of experiment\\nconclusions on previous debiasing techniques and calls for unbiased Recall\\nevaluation using randomly-exposed datasets. To bridge the gap, we propose the\\nUnbiased Recall Evaluation (URE) scheme, which adjusts the utilization of\\nrandomly-exposed datasets to unbiasedly estimate the true Recall performance on\\nfully-exposed datasets. We provide theoretical evidence to demonstrate the\\nrationality of URE and perform extensive experiments on real-world datasets to\\nvalidate its soundness.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.04810\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04810","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Debias Can be Unreliable: Mitigating Bias Issue in Evaluating Debiasing Recommendation
Recent work has improved recommendation models remarkably by equipping them
with debiasing methods. Due to the unavailability of fully-exposed datasets,
most existing approaches resort to randomly-exposed datasets as a proxy for
evaluating debiased models, employing traditional evaluation scheme to
represent the recommendation performance. However, in this study, we reveal
that traditional evaluation scheme is not suitable for randomly-exposed
datasets, leading to inconsistency between the Recall performance obtained
using randomly-exposed datasets and that obtained using fully-exposed datasets.
Such inconsistency indicates the potential unreliability of experiment
conclusions on previous debiasing techniques and calls for unbiased Recall
evaluation using randomly-exposed datasets. To bridge the gap, we propose the
Unbiased Recall Evaluation (URE) scheme, which adjusts the utilization of
randomly-exposed datasets to unbiasedly estimate the true Recall performance on
fully-exposed datasets. We provide theoretical evidence to demonstrate the
rationality of URE and perform extensive experiments on real-world datasets to
validate its soundness.