Savvina Daniil, Manel Slokom, Mirjam Cuper, Cynthia C. S. Liem, Jacco van Ossenbruggen, Laura Hollink
{"title":"研究推荐系统中的偏见所面临的挑战:用户 KNN 案例研究","authors":"Savvina Daniil, Manel Slokom, Mirjam Cuper, Cynthia C. S. Liem, Jacco van Ossenbruggen, Laura Hollink","doi":"arxiv-2409.08046","DOIUrl":null,"url":null,"abstract":"Statements on the propagation of bias by recommender systems are often hard\nto verify or falsify. Research on bias tends to draw from a small pool of\npublicly available datasets and is therefore bound by their specific\nproperties. Additionally, implementation choices are often not explicitly\ndescribed or motivated in research, while they may have an effect on bias\npropagation. In this paper, we explore the challenges of measuring and\nreporting popularity bias. We showcase the impact of data properties and\nalgorithm configurations on popularity bias by combining synthetic data with\nwell known recommender systems frameworks that implement UserKNN. First, we\nidentify data characteristics that might impact popularity bias, based on the\nfunctionality of UserKNN. Accordingly, we generate various datasets that\ncombine these characteristics. Second, we locate UserKNN configurations that\nvary across implementations in literature. We evaluate popularity bias for five\nsynthetic datasets and five UserKNN configurations, and offer insights on their\njoint effect. We find that, depending on the data characteristics, various\nUserKNN configurations can lead to different conclusions regarding the\npropagation of popularity bias. These results motivate the need for explicitly\naddressing algorithmic configuration and data properties when reporting and\ninterpreting bias in recommender systems.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the challenges of studying bias in Recommender Systems: A UserKNN case study\",\"authors\":\"Savvina Daniil, Manel Slokom, Mirjam Cuper, Cynthia C. S. Liem, Jacco van Ossenbruggen, Laura Hollink\",\"doi\":\"arxiv-2409.08046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Statements on the propagation of bias by recommender systems are often hard\\nto verify or falsify. Research on bias tends to draw from a small pool of\\npublicly available datasets and is therefore bound by their specific\\nproperties. Additionally, implementation choices are often not explicitly\\ndescribed or motivated in research, while they may have an effect on bias\\npropagation. In this paper, we explore the challenges of measuring and\\nreporting popularity bias. We showcase the impact of data properties and\\nalgorithm configurations on popularity bias by combining synthetic data with\\nwell known recommender systems frameworks that implement UserKNN. First, we\\nidentify data characteristics that might impact popularity bias, based on the\\nfunctionality of UserKNN. Accordingly, we generate various datasets that\\ncombine these characteristics. Second, we locate UserKNN configurations that\\nvary across implementations in literature. We evaluate popularity bias for five\\nsynthetic datasets and five UserKNN configurations, and offer insights on their\\njoint effect. We find that, depending on the data characteristics, various\\nUserKNN configurations can lead to different conclusions regarding the\\npropagation of popularity bias. These results motivate the need for explicitly\\naddressing algorithmic configuration and data properties when reporting and\\ninterpreting bias in recommender systems.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.08046\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the challenges of studying bias in Recommender Systems: A UserKNN case study
Statements on the propagation of bias by recommender systems are often hard
to verify or falsify. Research on bias tends to draw from a small pool of
publicly available datasets and is therefore bound by their specific
properties. Additionally, implementation choices are often not explicitly
described or motivated in research, while they may have an effect on bias
propagation. In this paper, we explore the challenges of measuring and
reporting popularity bias. We showcase the impact of data properties and
algorithm configurations on popularity bias by combining synthetic data with
well known recommender systems frameworks that implement UserKNN. First, we
identify data characteristics that might impact popularity bias, based on the
functionality of UserKNN. Accordingly, we generate various datasets that
combine these characteristics. Second, we locate UserKNN configurations that
vary across implementations in literature. We evaluate popularity bias for five
synthetic datasets and five UserKNN configurations, and offer insights on their
joint effect. We find that, depending on the data characteristics, various
UserKNN configurations can lead to different conclusions regarding the
propagation of popularity bias. These results motivate the need for explicitly
addressing algorithmic configuration and data properties when reporting and
interpreting bias in recommender systems.