Oren Sar Shalom, S. Berkovsky, Royi Ronen, Elad Ziklik, A. Amir
{"title":"推荐系统中的数据质量问题","authors":"Oren Sar Shalom, S. Berkovsky, Royi Ronen, Elad Ziklik, A. Amir","doi":"10.1145/2792838.2799670","DOIUrl":null,"url":null,"abstract":"Although data quality has been recognized as an important factor in the broad information systems research, it has received little attention in recommender systems. Data quality matters are typically addressed in recommenders by ad-hoc cleansing methods, which prune noisy or unreliable records from the data. However, the setting of the cleansing parameters is often done arbitrarily, without thorough consideration of the data characteristics. In this work, we turn to two central data quality problems in recommender systems: sparsity and redundancy. We devise models for setting data-dependent thresholds and sampling levels, and evaluate these using a collection of public and proprietary datasets. We observe that the models accurately predict data cleansing parameters, while having minor effect on the accuracy of the generated recommendations.","PeriodicalId":325637,"journal":{"name":"Proceedings of the 9th ACM Conference on Recommender Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Data Quality Matters in Recommender Systems\",\"authors\":\"Oren Sar Shalom, S. Berkovsky, Royi Ronen, Elad Ziklik, A. Amir\",\"doi\":\"10.1145/2792838.2799670\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although data quality has been recognized as an important factor in the broad information systems research, it has received little attention in recommender systems. Data quality matters are typically addressed in recommenders by ad-hoc cleansing methods, which prune noisy or unreliable records from the data. However, the setting of the cleansing parameters is often done arbitrarily, without thorough consideration of the data characteristics. In this work, we turn to two central data quality problems in recommender systems: sparsity and redundancy. We devise models for setting data-dependent thresholds and sampling levels, and evaluate these using a collection of public and proprietary datasets. We observe that the models accurately predict data cleansing parameters, while having minor effect on the accuracy of the generated recommendations.\",\"PeriodicalId\":325637,\"journal\":{\"name\":\"Proceedings of the 9th ACM Conference on Recommender Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 9th ACM Conference on Recommender Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2792838.2799670\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th ACM Conference on Recommender Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2792838.2799670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Although data quality has been recognized as an important factor in the broad information systems research, it has received little attention in recommender systems. Data quality matters are typically addressed in recommenders by ad-hoc cleansing methods, which prune noisy or unreliable records from the data. However, the setting of the cleansing parameters is often done arbitrarily, without thorough consideration of the data characteristics. In this work, we turn to two central data quality problems in recommender systems: sparsity and redundancy. We devise models for setting data-dependent thresholds and sampling levels, and evaluate these using a collection of public and proprietary datasets. We observe that the models accurately predict data cleansing parameters, while having minor effect on the accuracy of the generated recommendations.