基于Spark的高效分布式矩阵分解交替最小二乘推荐系统

IF 1 Q3 INFORMATION SCIENCE & LIBRARY SCIENCE

Journal of Information & Knowledge Management Pub Date : 2021-12-04 DOI:10.1142/s0219649222500125

R. R. S. Ravi Kumar, G. Appa Rao, S. Anuradha

{"title":"基于Spark的高效分布式矩阵分解交替最小二乘推荐系统","authors":"R. R. S. Ravi Kumar, G. Appa Rao, S. Anuradha","doi":"10.1142/s0219649222500125","DOIUrl":null,"url":null,"abstract":"With the emergence of e-commerce and social networking systems, the use of recommendation systems gained popularity to predict the user ratings of an item. Since the large volume of data is generated from various sources at high speed, predicting the ratings accurately in real-time adds enormous benefit to the users while choosing the correct item. So a recommendation system must be capable enough to predict the rating accurately when the data are large. Apache Spark is a distributed framework well suited for processing large datasets and real-time data streams. In this paper, we propose an efficient matrix factorisation algorithm based on Spark MLlib alternating least squares (ALS) for collaborative filtering. The optimisations used for the proposed algorithm using Tungsten improved the performance of the algorithm significantly while doing the predictions. The experimental results prove that the proposed work is significantly faster for top-N recommendations and rating predictions compared with the existing works.","PeriodicalId":45460,"journal":{"name":"Journal of Information & Knowledge Management","volume":"1 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2021-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Distributed Matrix Factorization Alternating Least Squares (EDMFALS) for Recommendation Systems Using Spark\",\"authors\":\"R. R. S. Ravi Kumar, G. Appa Rao, S. Anuradha\",\"doi\":\"10.1142/s0219649222500125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the emergence of e-commerce and social networking systems, the use of recommendation systems gained popularity to predict the user ratings of an item. Since the large volume of data is generated from various sources at high speed, predicting the ratings accurately in real-time adds enormous benefit to the users while choosing the correct item. So a recommendation system must be capable enough to predict the rating accurately when the data are large. Apache Spark is a distributed framework well suited for processing large datasets and real-time data streams. In this paper, we propose an efficient matrix factorisation algorithm based on Spark MLlib alternating least squares (ALS) for collaborative filtering. The optimisations used for the proposed algorithm using Tungsten improved the performance of the algorithm significantly while doing the predictions. The experimental results prove that the proposed work is significantly faster for top-N recommendations and rating predictions compared with the existing works.\",\"PeriodicalId\":45460,\"journal\":{\"name\":\"Journal of Information & Knowledge Management\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2021-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Information & Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s0219649222500125\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0219649222500125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

摘要

随着电子商务和社交网络系统的出现，使用推荐系统来预测商品的用户评分变得越来越流行。由于大量数据是从各种来源高速生成的，因此在选择正确项目的同时，实时准确预测评级为用户带来了巨大的好处。因此，当数据很大时，推荐系统必须能够准确预测评级。Apache Spark是一个分布式框架，非常适合处理大型数据集和实时数据流。在本文中，我们提出了一种基于Spark MLlib交替最小二乘（ALS）的高效矩阵分解算法，用于协同滤波。在进行预测时，所提出的使用钨的算法的优化显著提高了算法的性能。实验结果证明，与现有工作相比，所提出的工作在前N个推荐和评级预测方面明显更快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient Distributed Matrix Factorization Alternating Least Squares (EDMFALS) for Recommendation Systems Using Spark

With the emergence of e-commerce and social networking systems, the use of recommendation systems gained popularity to predict the user ratings of an item. Since the large volume of data is generated from various sources at high speed, predicting the ratings accurately in real-time adds enormous benefit to the users while choosing the correct item. So a recommendation system must be capable enough to predict the rating accurately when the data are large. Apache Spark is a distributed framework well suited for processing large datasets and real-time data streams. In this paper, we propose an efficient matrix factorisation algorithm based on Spark MLlib alternating least squares (ALS) for collaborative filtering. The optimisations used for the proposed algorithm using Tungsten improved the performance of the algorithm significantly while doing the predictions. The experimental results prove that the proposed work is significantly faster for top-N recommendations and rating predictions compared with the existing works.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Information & Knowledge Management INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

2.40

自引率

25.00%

发文量

期刊介绍： JIKM is a refereed journal published quarterly by World Scientific and dedicated to the exchange of the latest research and practical information in the field of information processing and knowledge management. The journal publishes original research and case studies by academic, business and government contributors on all aspects of information processing, information management, knowledge management, tools, techniques and technologies, knowledge creation and sharing, best practices, policies and guidelines. JIKM is an international journal aimed at providing quality information to subscribers around the world. Managed by an international editorial board, JIKM positions itself as one of the leading scholarly journals in the field of information processing and knowledge management. It is a good reference for both information and knowledge management professionals. The journal covers key areas in the field of information and knowledge management. Research papers, practical applications, working papers, and case studies are invited in the following areas: -Business intelligence and competitive intelligence -Communication and organizational culture -e-Learning and life long learning -Electronic records and document management -Information processing and information management -Information organization, taxonomies and ontology -Intellectual capital -Knowledge creation, retention, sharing and transfer -Knowledge discovery, data and text mining -Knowledge management and innovations -Knowledge management education -Knowledge management tools and technologies -Knowledge management measurements -Knowledge professionals and leadership -Learning organization and organizational learning -Practical implementations of knowledge management