Dionisis Margaris, Dionysios Vasilopoulos, C. Vassilakis, D. Spiliotopoulos
{"title":"Improving Collaborative Filtering’s Rating Prediction Coverage in Sparse Datasets through the Introduction of Virtual Near Neighbors","authors":"Dionisis Margaris, Dionysios Vasilopoulos, C. Vassilakis, D. Spiliotopoulos","doi":"10.1109/IISA.2019.8900678","DOIUrl":null,"url":null,"abstract":"Collaborative filtering creates personalized recommendations by considering ratings entered by users. Collaborative filtering algorithms initially detect users whose likings are alike, by exploring the similarity between ratings that have insofar been submitted. Users having a high degree of similarity regarding their ratings are termed near neighbors, and in order to formulate a recommendation for a user, her near neighbors’ ratings are extracted and form the basis for the recommendation. Collaborative filtering algorithms however exhibit the problem commonly referred to as “gray sheep this pertains to the case where for some users no near neighbors can be identified, and hence no personalized recommendations can be computed. The “gray sheep” problem is more severe in sparse datasets, i.e. datasets where the number of ratings is small, compared to the number of items and users. In this paper, we address the “gray sheep” problem by introducing the concept of virtual near neighbors and a related algorithm for their creation on the basis of the existing ones. We evaluate the proposed algorithm, which is termed as CFVNN, using eight widely used datasets and considering two correlation metrics which are widely used in Collaborative Filtering research, namely the Pearson Correlation Coefficient and the Cosine Similarity. The results show that the proposed algorithm considerably leverages the capability of a Collaborative Filtering system to compute personalized recommendations in the context of sparse datasets, tackling thus efficiently the “gray sheep” problem. In parallel, the CFVNN algorithm achieves improvements in rating prediction quality, as this is expressed through the Mean Absolute Error and the Root Mean Square Error metrics.","PeriodicalId":371385,"journal":{"name":"2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA.2019.8900678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Collaborative filtering creates personalized recommendations by considering ratings entered by users. Collaborative filtering algorithms initially detect users whose likings are alike, by exploring the similarity between ratings that have insofar been submitted. Users having a high degree of similarity regarding their ratings are termed near neighbors, and in order to formulate a recommendation for a user, her near neighbors’ ratings are extracted and form the basis for the recommendation. Collaborative filtering algorithms however exhibit the problem commonly referred to as “gray sheep this pertains to the case where for some users no near neighbors can be identified, and hence no personalized recommendations can be computed. The “gray sheep” problem is more severe in sparse datasets, i.e. datasets where the number of ratings is small, compared to the number of items and users. In this paper, we address the “gray sheep” problem by introducing the concept of virtual near neighbors and a related algorithm for their creation on the basis of the existing ones. We evaluate the proposed algorithm, which is termed as CFVNN, using eight widely used datasets and considering two correlation metrics which are widely used in Collaborative Filtering research, namely the Pearson Correlation Coefficient and the Cosine Similarity. The results show that the proposed algorithm considerably leverages the capability of a Collaborative Filtering system to compute personalized recommendations in the context of sparse datasets, tackling thus efficiently the “gray sheep” problem. In parallel, the CFVNN algorithm achieves improvements in rating prediction quality, as this is expressed through the Mean Absolute Error and the Root Mean Square Error metrics.