白盒:协同过滤推荐系统性能的预测

IF 4.1 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology Pub Date : 2023-02-23 DOI:https://dl.acm.org/doi/10.1145/3554979

Iulia Paun, Yashar Moshfeghi, Nikos Ntarmos

{"title":"白盒:协同过滤推荐系统性能的预测","authors":"Iulia Paun, Yashar Moshfeghi, Nikos Ntarmos","doi":"https://dl.acm.org/doi/10.1145/3554979","DOIUrl":null,"url":null,"abstract":"<p>Collaborative Filtering (CF) recommendation algorithms are a popular solution to the information overload problem, aiding users in the item selection process. Relevant research has long focused on refining and improving these models to produce better (more effective) recommendations, and has converged on a methodology to predict their effectiveness on target datasets by evaluating them on random samples of the latter. However, predicting the efficiency of the solutions—especially with regard to their time- and resource-hungry training phase, whose requirements dwarf those of the prediction/recommendation phase—has received little to no attention in the literature. This article addresses this gap for a number of representative and highly popular CF models, including algorithms based on matrix factorization, k-nearest neighbors, co-clustering, and slope one schemes. To this end, we first study the computational complexity of the training phase of said CF models and derive time and space complexity equations. Then, using characteristics of the input and the aforementioned equations, we contribute a methodology for predicting the processing time and memory usage of their training phase. Our contributions further include an adaptive sampling strategy, to address the tradeoff between resource usage costs and prediction accuracy, and a framework that quantifies both the efficiency and effectiveness of CF. Finally, a systematic experimental evaluation demonstrates that our method outperforms state-of-the-art regression schemes by a considerable margin, with an overhead that is a small fraction of the overall requirements of CF training.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"9 1","pages":""},"PeriodicalIF":4.1000,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"White Box: On the Prediction of Collaborative Filtering Recommendation Systems’ Performance\",\"authors\":\"Iulia Paun, Yashar Moshfeghi, Nikos Ntarmos\",\"doi\":\"https://dl.acm.org/doi/10.1145/3554979\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Collaborative Filtering (CF) recommendation algorithms are a popular solution to the information overload problem, aiding users in the item selection process. Relevant research has long focused on refining and improving these models to produce better (more effective) recommendations, and has converged on a methodology to predict their effectiveness on target datasets by evaluating them on random samples of the latter. However, predicting the efficiency of the solutions—especially with regard to their time- and resource-hungry training phase, whose requirements dwarf those of the prediction/recommendation phase—has received little to no attention in the literature. This article addresses this gap for a number of representative and highly popular CF models, including algorithms based on matrix factorization, k-nearest neighbors, co-clustering, and slope one schemes. To this end, we first study the computational complexity of the training phase of said CF models and derive time and space complexity equations. Then, using characteristics of the input and the aforementioned equations, we contribute a methodology for predicting the processing time and memory usage of their training phase. Our contributions further include an adaptive sampling strategy, to address the tradeoff between resource usage costs and prediction accuracy, and a framework that quantifies both the efficiency and effectiveness of CF. Finally, a systematic experimental evaluation demonstrates that our method outperforms state-of-the-art regression schemes by a considerable margin, with an overhead that is a small fraction of the overall requirements of CF training.</p>\",\"PeriodicalId\":50911,\"journal\":{\"name\":\"ACM Transactions on Internet Technology\",\"volume\":\"9 1\",\"pages\":\"\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2023-02-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Internet Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/https://dl.acm.org/doi/10.1145/3554979\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Internet Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3554979","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

协同过滤(CF)推荐算法是解决信息过载问题的一种流行方法，它可以帮助用户在项目选择过程中进行选择。长期以来，相关研究一直专注于精炼和改进这些模型，以产生更好(更有效)的建议，并汇集了一种方法，通过在后者的随机样本上评估它们来预测它们在目标数据集上的有效性。然而，预测解决方案的效率——特别是在时间和资源匮乏的训练阶段，其需求使预测/推荐阶段相形见绌——在文献中几乎没有得到关注。本文为许多具有代表性和非常流行的CF模型解决了这一差距，包括基于矩阵分解、k近邻、共聚类和斜率为1的方案的算法。为此，我们首先研究了所述CF模型训练阶段的计算复杂度，并推导了时间和空间复杂度方程。然后，利用输入的特征和上述方程，我们提出了一种预测其训练阶段的处理时间和内存使用的方法。我们的贡献还包括自适应采样策略，以解决资源使用成本和预测准确性之间的权衡，以及量化CF的效率和有效性的框架。最后，系统的实验评估表明，我们的方法在相当大的范围内优于最先进的回归方案，开销仅占CF训练总体需求的一小部分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

White Box: On the Prediction of Collaborative Filtering Recommendation Systems’ Performance

Collaborative Filtering (CF) recommendation algorithms are a popular solution to the information overload problem, aiding users in the item selection process. Relevant research has long focused on refining and improving these models to produce better (more effective) recommendations, and has converged on a methodology to predict their effectiveness on target datasets by evaluating them on random samples of the latter. However, predicting the efficiency of the solutions—especially with regard to their time- and resource-hungry training phase, whose requirements dwarf those of the prediction/recommendation phase—has received little to no attention in the literature. This article addresses this gap for a number of representative and highly popular CF models, including algorithms based on matrix factorization, k-nearest neighbors, co-clustering, and slope one schemes. To this end, we first study the computational complexity of the training phase of said CF models and derive time and space complexity equations. Then, using characteristics of the input and the aforementioned equations, we contribute a methodology for predicting the processing time and memory usage of their training phase. Our contributions further include an adaptive sampling strategy, to address the tradeoff between resource usage costs and prediction accuracy, and a framework that quantifies both the efficiency and effectiveness of CF. Finally, a systematic experimental evaluation demonstrates that our method outperforms state-of-the-art regression schemes by a considerable margin, with an overhead that is a small fraction of the overall requirements of CF training.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Internet Technology 工程技术-计算机：软件工程

CiteScore

10.30

自引率

1.90%

发文量

137

审稿时长

>12 weeks

期刊介绍： ACM Transactions on Internet Technology (TOIT) brings together many computing disciplines including computer software engineering, computer programming languages, middleware, database management, security, knowledge discovery and data mining, networking and distributed systems, communications, performance and scalability etc. TOIT will cover the results and roles of the individual disciplines and the relationshipsamong them.