White Box: On the Prediction of Collaborative Filtering Recommendation Systems’ Performance

IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Iulia Paun, Yashar Moshfeghi, Nikos Ntarmos
{"title":"White Box: On the Prediction of Collaborative Filtering Recommendation Systems’ Performance","authors":"Iulia Paun, Yashar Moshfeghi, Nikos Ntarmos","doi":"https://dl.acm.org/doi/10.1145/3554979","DOIUrl":null,"url":null,"abstract":"<p>Collaborative Filtering (CF) recommendation algorithms are a popular solution to the information overload problem, aiding users in the item selection process. Relevant research has long focused on refining and improving these models to produce better (more effective) recommendations, and has converged on a methodology to predict their effectiveness on target datasets by evaluating them on random samples of the latter. However, predicting the efficiency of the solutions—especially with regard to their time- and resource-hungry training phase, whose requirements dwarf those of the prediction/recommendation phase—has received little to no attention in the literature. This article addresses this gap for a number of representative and highly popular CF models, including algorithms based on matrix factorization, k-nearest neighbors, co-clustering, and slope one schemes. To this end, we first study the computational complexity of the training phase of said CF models and derive time and space complexity equations. Then, using characteristics of the input and the aforementioned equations, we contribute a methodology for predicting the processing time and memory usage of their training phase. Our contributions further include an adaptive sampling strategy, to address the tradeoff between resource usage costs and prediction accuracy, and a framework that quantifies both the efficiency and effectiveness of CF. Finally, a systematic experimental evaluation demonstrates that our method outperforms state-of-the-art regression schemes by a considerable margin, with an overhead that is a small fraction of the overall requirements of CF training.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"9 1","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Internet Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3554979","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Collaborative Filtering (CF) recommendation algorithms are a popular solution to the information overload problem, aiding users in the item selection process. Relevant research has long focused on refining and improving these models to produce better (more effective) recommendations, and has converged on a methodology to predict their effectiveness on target datasets by evaluating them on random samples of the latter. However, predicting the efficiency of the solutions—especially with regard to their time- and resource-hungry training phase, whose requirements dwarf those of the prediction/recommendation phase—has received little to no attention in the literature. This article addresses this gap for a number of representative and highly popular CF models, including algorithms based on matrix factorization, k-nearest neighbors, co-clustering, and slope one schemes. To this end, we first study the computational complexity of the training phase of said CF models and derive time and space complexity equations. Then, using characteristics of the input and the aforementioned equations, we contribute a methodology for predicting the processing time and memory usage of their training phase. Our contributions further include an adaptive sampling strategy, to address the tradeoff between resource usage costs and prediction accuracy, and a framework that quantifies both the efficiency and effectiveness of CF. Finally, a systematic experimental evaluation demonstrates that our method outperforms state-of-the-art regression schemes by a considerable margin, with an overhead that is a small fraction of the overall requirements of CF training.

白盒:协同过滤推荐系统性能的预测
协同过滤(CF)推荐算法是解决信息过载问题的一种流行方法,它可以帮助用户在项目选择过程中进行选择。长期以来,相关研究一直专注于精炼和改进这些模型,以产生更好(更有效)的建议,并汇集了一种方法,通过在后者的随机样本上评估它们来预测它们在目标数据集上的有效性。然而,预测解决方案的效率——特别是在时间和资源匮乏的训练阶段,其需求使预测/推荐阶段相形见绌——在文献中几乎没有得到关注。本文为许多具有代表性和非常流行的CF模型解决了这一差距,包括基于矩阵分解、k近邻、共聚类和斜率为1的方案的算法。为此,我们首先研究了所述CF模型训练阶段的计算复杂度,并推导了时间和空间复杂度方程。然后,利用输入的特征和上述方程,我们提出了一种预测其训练阶段的处理时间和内存使用的方法。我们的贡献还包括自适应采样策略,以解决资源使用成本和预测准确性之间的权衡,以及量化CF的效率和有效性的框架。最后,系统的实验评估表明,我们的方法在相当大的范围内优于最先进的回归方案,开销仅占CF训练总体需求的一小部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Transactions on Internet Technology
ACM Transactions on Internet Technology 工程技术-计算机:软件工程
CiteScore
10.30
自引率
1.90%
发文量
137
审稿时长
>12 weeks
期刊介绍: ACM Transactions on Internet Technology (TOIT) brings together many computing disciplines including computer software engineering, computer programming languages, middleware, database management, security, knowledge discovery and data mining, networking and distributed systems, communications, performance and scalability etc. TOIT will cover the results and roles of the individual disciplines and the relationshipsamong them.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信