基于矩阵分解的多核可扩展任务并行SGD

2015 IEEE International Parallel and Distributed Processing Symposium Workshop Pub Date : 2015-05-25 DOI:10.1109/IPDPSW.2015.135

Yusuke Nishioka, K. Taura

{"title":"基于矩阵分解的多核可扩展任务并行SGD","authors":"Yusuke Nishioka, K. Taura","doi":"10.1109/IPDPSW.2015.135","DOIUrl":null,"url":null,"abstract":"Recommendation is an indispensable technique especially in e-commerce services such as Amazon or Netflix to provide more preferable items to users. Matrix factorization is a well-known algorithm for recommendation which estimates affinities between users and items solely based on ratings explicitly given by users. To handle the large amounts of data, stochastic gradient descent (SGD), which is an online loss minimization algorithm, can be applied to matrix factorization. SGD is an effective method in terms of both convergence speed and memory consumption, but is difficult to be parallelized due to its essential sequentiality. FPSGD by Zhuang et al. Cite fpsgd is an existing parallel SGD method for matrix factorization by dividing the rating matrix into many small blocks. Threads work on blocks, so that they do not update the same rows or columns of the factor matrices. Because of this technique FPSGD achieves higher convergence speed than other existing methods. Still, as we demonstrate in this paper, FPSGD does not scale beyond 32 cores with 1.4GB Netflix dataset because assigning non-conflicting blocks to threads needs a lock operation. In this work, we propose an alternative approach of SGD for matrix factorization using task parallel programming model. As a result, we have successfully overcome the bottleneck of FPSGD and achieved higher scalability with 64 cores.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures\",\"authors\":\"Yusuke Nishioka, K. Taura\",\"doi\":\"10.1109/IPDPSW.2015.135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recommendation is an indispensable technique especially in e-commerce services such as Amazon or Netflix to provide more preferable items to users. Matrix factorization is a well-known algorithm for recommendation which estimates affinities between users and items solely based on ratings explicitly given by users. To handle the large amounts of data, stochastic gradient descent (SGD), which is an online loss minimization algorithm, can be applied to matrix factorization. SGD is an effective method in terms of both convergence speed and memory consumption, but is difficult to be parallelized due to its essential sequentiality. FPSGD by Zhuang et al. Cite fpsgd is an existing parallel SGD method for matrix factorization by dividing the rating matrix into many small blocks. Threads work on blocks, so that they do not update the same rows or columns of the factor matrices. Because of this technique FPSGD achieves higher convergence speed than other existing methods. Still, as we demonstrate in this paper, FPSGD does not scale beyond 32 cores with 1.4GB Netflix dataset because assigning non-conflicting blocks to threads needs a lock operation. In this work, we propose an alternative approach of SGD for matrix factorization using task parallel programming model. As a result, we have successfully overcome the bottleneck of FPSGD and achieved higher scalability with 64 cores.\",\"PeriodicalId\":340697,\"journal\":{\"name\":\"2015 IEEE International Parallel and Distributed Processing Symposium Workshop\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Parallel and Distributed Processing Symposium Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2015.135\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2015.135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

推荐是一种必不可少的技术，特别是在亚马逊或Netflix等电子商务服务中，为用户提供更喜欢的商品。矩阵分解是一种著名的推荐算法，它仅根据用户明确给出的评分来估计用户和物品之间的亲和力。为了处理大量数据，随机梯度下降算法(SGD)是一种在线损失最小化算法，可以应用于矩阵分解。SGD在收敛速度和内存消耗方面都是一种有效的方法，但由于其本质上的顺序性而难以并行化。FPSGD(庄等)Cite fpsgd是一种现有的并行SGD方法，通过将评级矩阵划分为许多小块来进行矩阵分解。线程在块上工作，因此它们不会更新因子矩阵的相同行或列。由于这种技术，FPSGD的收敛速度比其他现有方法要快。尽管如此，正如我们在本文中所演示的那样，FPSGD在使用1.4GB Netflix数据集时不能扩展到32核以上，因为将不冲突的块分配给线程需要锁操作。在这项工作中，我们提出了一种使用任务并行编程模型进行矩阵分解的SGD替代方法。因此，我们成功地克服了FPSGD的瓶颈，并在64核下实现了更高的可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures

Recommendation is an indispensable technique especially in e-commerce services such as Amazon or Netflix to provide more preferable items to users. Matrix factorization is a well-known algorithm for recommendation which estimates affinities between users and items solely based on ratings explicitly given by users. To handle the large amounts of data, stochastic gradient descent (SGD), which is an online loss minimization algorithm, can be applied to matrix factorization. SGD is an effective method in terms of both convergence speed and memory consumption, but is difficult to be parallelized due to its essential sequentiality. FPSGD by Zhuang et al. Cite fpsgd is an existing parallel SGD method for matrix factorization by dividing the rating matrix into many small blocks. Threads work on blocks, so that they do not update the same rows or columns of the factor matrices. Because of this technique FPSGD achieves higher convergence speed than other existing methods. Still, as we demonstrate in this paper, FPSGD does not scale beyond 32 cores with 1.4GB Netflix dataset because assigning non-conflicting blocks to threads needs a lock operation. In this work, we propose an alternative approach of SGD for matrix factorization using task parallel programming model. As a result, we have successfully overcome the bottleneck of FPSGD and achieved higher scalability with 64 cores.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Parallel and Distributed Processing Symposium Workshop

自引率

0.00%

发文量