Training and testing of recommender systems on data missing not at random

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2010-07-25 DOI:10.1145/1835804.1835895

H. Steck

引用次数: 345

Abstract

Users typically rate only a small fraction of all available items. We show that the absence of ratings carries useful information for improving the top-k hit rate concerning all items, a natural accuracy measure for recommendations. As to test recommender systems, we present two performance measures that can be estimated, under mild assumptions, without bias from data even when ratings are missing not at random (MNAR). As to achieve optimal test results, we present appropriate surrogate objective functions for efficient training on MNAR data. Their main property is to account for all ratings - whether observed or missing in the data. Concerning the top-k hit rate on test data, our experiments indicate dramatic improvements over even sophisticated methods that are optimized on observed ratings only.

查看原文本刊更多论文

针对非随机缺失数据的推荐系统进行培训和测试

用户通常只对所有可用物品中的一小部分进行评分。我们表明，评级的缺失为提高所有项目的top-k命中率提供了有用的信息，这是推荐的自然准确性度量。为了测试推荐系统，我们提出了两个性能度量，在温和的假设下，即使在评级缺失非随机(MNAR)的情况下，也可以从数据中估计出没有偏差的性能度量。为了获得最佳的测试结果，我们提出了合适的替代目标函数来对MNAR数据进行有效的训练。它们的主要属性是考虑所有评级——无论是观察到的还是数据中缺失的。关于测试数据的top-k命中率，我们的实验表明，即使是仅根据观察到的评级进行优化的复杂方法，也有显着的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量