研究推荐系统中的偏见所面临的挑战：用户 KNN 案例研究

arXiv - CS - Information Retrieval Pub Date : 2024-09-12 DOI:arxiv-2409.08046

Savvina Daniil, Manel Slokom, Mirjam Cuper, Cynthia C. S. Liem, Jacco van Ossenbruggen, Laura Hollink

{"title":"研究推荐系统中的偏见所面临的挑战：用户 KNN 案例研究","authors":"Savvina Daniil, Manel Slokom, Mirjam Cuper, Cynthia C. S. Liem, Jacco van Ossenbruggen, Laura Hollink","doi":"arxiv-2409.08046","DOIUrl":null,"url":null,"abstract":"Statements on the propagation of bias by recommender systems are often hard\nto verify or falsify. Research on bias tends to draw from a small pool of\npublicly available datasets and is therefore bound by their specific\nproperties. Additionally, implementation choices are often not explicitly\ndescribed or motivated in research, while they may have an effect on bias\npropagation. In this paper, we explore the challenges of measuring and\nreporting popularity bias. We showcase the impact of data properties and\nalgorithm configurations on popularity bias by combining synthetic data with\nwell known recommender systems frameworks that implement UserKNN. First, we\nidentify data characteristics that might impact popularity bias, based on the\nfunctionality of UserKNN. Accordingly, we generate various datasets that\ncombine these characteristics. Second, we locate UserKNN configurations that\nvary across implementations in literature. We evaluate popularity bias for five\nsynthetic datasets and five UserKNN configurations, and offer insights on their\njoint effect. We find that, depending on the data characteristics, various\nUserKNN configurations can lead to different conclusions regarding the\npropagation of popularity bias. These results motivate the need for explicitly\naddressing algorithmic configuration and data properties when reporting and\ninterpreting bias in recommender systems.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the challenges of studying bias in Recommender Systems: A UserKNN case study\",\"authors\":\"Savvina Daniil, Manel Slokom, Mirjam Cuper, Cynthia C. S. Liem, Jacco van Ossenbruggen, Laura Hollink\",\"doi\":\"arxiv-2409.08046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Statements on the propagation of bias by recommender systems are often hard\\nto verify or falsify. Research on bias tends to draw from a small pool of\\npublicly available datasets and is therefore bound by their specific\\nproperties. Additionally, implementation choices are often not explicitly\\ndescribed or motivated in research, while they may have an effect on bias\\npropagation. In this paper, we explore the challenges of measuring and\\nreporting popularity bias. We showcase the impact of data properties and\\nalgorithm configurations on popularity bias by combining synthetic data with\\nwell known recommender systems frameworks that implement UserKNN. First, we\\nidentify data characteristics that might impact popularity bias, based on the\\nfunctionality of UserKNN. Accordingly, we generate various datasets that\\ncombine these characteristics. Second, we locate UserKNN configurations that\\nvary across implementations in literature. We evaluate popularity bias for five\\nsynthetic datasets and five UserKNN configurations, and offer insights on their\\njoint effect. We find that, depending on the data characteristics, various\\nUserKNN configurations can lead to different conclusions regarding the\\npropagation of popularity bias. These results motivate the need for explicitly\\naddressing algorithmic configuration and data properties when reporting and\\ninterpreting bias in recommender systems.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08046\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

关于推荐系统传播偏见的说法往往难以验证或证伪。有关偏差的研究往往从一小部分公开可用的数据集中提取数据，因此受到这些数据集特定属性的限制。此外，在研究中，实施选择往往没有明确的描述或动机，而这些选择可能会对生物传播产生影响。在本文中，我们探讨了测量和报告流行度偏差所面临的挑战。我们通过将合成数据与实施 UserKNN 的已知推荐系统框架相结合，展示了数据属性和算法配置对流行度偏差的影响。首先，我们根据 UserKNN 的功能，识别出可能影响流行度偏差的数据特征。因此，我们生成了结合这些特征的各种数据集。其次，我们找到了文献中不同实现的 UserKNN 配置。我们评估了五个合成数据集和五种 UserKNN 配置的流行度偏差，并就它们的共同影响提出了见解。我们发现，根据数据特征的不同，不同的用户 KNN 配置会导致关于流行度偏差传播的不同结论。这些结果表明，在报告和解释推荐系统中的偏差时，有必要明确处理算法配置和数据属性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the challenges of studying bias in Recommender Systems: A UserKNN case study

Statements on the propagation of bias by recommender systems are often hard to verify or falsify. Research on bias tends to draw from a small pool of publicly available datasets and is therefore bound by their specific properties. Additionally, implementation choices are often not explicitly described or motivated in research, while they may have an effect on bias propagation. In this paper, we explore the challenges of measuring and reporting popularity bias. We showcase the impact of data properties and algorithm configurations on popularity bias by combining synthetic data with well known recommender systems frameworks that implement UserKNN. First, we identify data characteristics that might impact popularity bias, based on the functionality of UserKNN. Accordingly, we generate various datasets that combine these characteristics. Second, we locate UserKNN configurations that vary across implementations in literature. We evaluate popularity bias for five synthetic datasets and five UserKNN configurations, and offer insights on their joint effect. We find that, depending on the data characteristics, various UserKNN configurations can lead to different conclusions regarding the propagation of popularity bias. These results motivate the need for explicitly addressing algorithmic configuration and data properties when reporting and interpreting bias in recommender systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Information Retrieval

自引率

0.00%

发文量