Improving Collaborative Filtering’s Rating Prediction Coverage in Sparse Datasets through the Introduction of Virtual Near Neighbors

2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA) Pub Date : 2019-07-15 DOI:10.1109/IISA.2019.8900678

Dionisis Margaris, Dionysios Vasilopoulos, C. Vassilakis, D. Spiliotopoulos

{"title":"Improving Collaborative Filtering’s Rating Prediction Coverage in Sparse Datasets through the Introduction of Virtual Near Neighbors","authors":"Dionisis Margaris, Dionysios Vasilopoulos, C. Vassilakis, D. Spiliotopoulos","doi":"10.1109/IISA.2019.8900678","DOIUrl":null,"url":null,"abstract":"Collaborative filtering creates personalized recommendations by considering ratings entered by users. Collaborative filtering algorithms initially detect users whose likings are alike, by exploring the similarity between ratings that have insofar been submitted. Users having a high degree of similarity regarding their ratings are termed near neighbors, and in order to formulate a recommendation for a user, her near neighbors’ ratings are extracted and form the basis for the recommendation. Collaborative filtering algorithms however exhibit the problem commonly referred to as “gray sheep this pertains to the case where for some users no near neighbors can be identified, and hence no personalized recommendations can be computed. The “gray sheep” problem is more severe in sparse datasets, i.e. datasets where the number of ratings is small, compared to the number of items and users. In this paper, we address the “gray sheep” problem by introducing the concept of virtual near neighbors and a related algorithm for their creation on the basis of the existing ones. We evaluate the proposed algorithm, which is termed as CFVNN, using eight widely used datasets and considering two correlation metrics which are widely used in Collaborative Filtering research, namely the Pearson Correlation Coefficient and the Cosine Similarity. The results show that the proposed algorithm considerably leverages the capability of a Collaborative Filtering system to compute personalized recommendations in the context of sparse datasets, tackling thus efficiently the “gray sheep” problem. In parallel, the CFVNN algorithm achieves improvements in rating prediction quality, as this is expressed through the Mean Absolute Error and the Root Mean Square Error metrics.","PeriodicalId":371385,"journal":{"name":"2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA.2019.8900678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Collaborative filtering creates personalized recommendations by considering ratings entered by users. Collaborative filtering algorithms initially detect users whose likings are alike, by exploring the similarity between ratings that have insofar been submitted. Users having a high degree of similarity regarding their ratings are termed near neighbors, and in order to formulate a recommendation for a user, her near neighbors’ ratings are extracted and form the basis for the recommendation. Collaborative filtering algorithms however exhibit the problem commonly referred to as “gray sheep this pertains to the case where for some users no near neighbors can be identified, and hence no personalized recommendations can be computed. The “gray sheep” problem is more severe in sparse datasets, i.e. datasets where the number of ratings is small, compared to the number of items and users. In this paper, we address the “gray sheep” problem by introducing the concept of virtual near neighbors and a related algorithm for their creation on the basis of the existing ones. We evaluate the proposed algorithm, which is termed as CFVNN, using eight widely used datasets and considering two correlation metrics which are widely used in Collaborative Filtering research, namely the Pearson Correlation Coefficient and the Cosine Similarity. The results show that the proposed algorithm considerably leverages the capability of a Collaborative Filtering system to compute personalized recommendations in the context of sparse datasets, tackling thus efficiently the “gray sheep” problem. In parallel, the CFVNN algorithm achieves improvements in rating prediction quality, as this is expressed through the Mean Absolute Error and the Root Mean Square Error metrics.

查看原文本刊更多论文

通过引入虚拟近邻提高稀疏数据集协同过滤的评级预测覆盖率

协同过滤通过考虑用户输入的评分创建个性化推荐。协同过滤算法最初通过探索迄今为止提交的评分之间的相似性来检测喜欢相似的用户。在评分方面具有高度相似性的用户被称为近邻用户，为了制定对用户的推荐，提取其近邻的评分并形成推荐的基础。然而，协同过滤算法显示了通常被称为“灰羊”的问题，这涉及到某些用户无法识别近邻，因此无法计算个性化推荐的情况。“灰羊”问题在稀疏数据集中更为严重，即与项目和用户数量相比，评级数量较少的数据集。在本文中，我们通过引入虚拟近邻的概念以及在现有近邻的基础上创建虚拟近邻的相关算法来解决“灰羊”问题。我们使用8个广泛使用的数据集，并考虑在协同过滤研究中广泛使用的两个相关度量，即Pearson相关系数和余弦相似度，来评估所提出的CFVNN算法。结果表明，该算法充分利用了协同过滤系统在稀疏数据集背景下计算个性化推荐的能力，从而有效地解决了“灰羊”问题。与此同时，CFVNN算法在评级预测质量上取得了进步，因为这是通过Mean Absolute Error和Root Mean Square Error度量来表达的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)

自引率

0.00%

发文量