Improved Hybrid Collaborative Fitering Algorithm Based on Spark Platform

Q3 Multidisciplinary
Zhen YOU, Hongwen HU, Yutao WANG, Jinyun XUE, Xinwu YI
{"title":"Improved Hybrid Collaborative Fitering Algorithm Based on Spark Platform","authors":"Zhen YOU, Hongwen HU, Yutao WANG, Jinyun XUE, Xinwu YI","doi":"10.1051/wujns/2023285451","DOIUrl":null,"url":null,"abstract":"An improved Hybrid Collaborative Filtering algorithm (H-CF) is proposed, addressing the issues of data sparsity, low recommendation accuracy, and poor scalability present in traditional collaborative filtering algorithms. The core of H-CF is a linear weighted hybrid algorithm based on the Latent Factor Model (LFM) and the Improved Item Clustering and Similarity Calculation Collaborative Filtering Algorithm (ITCSCF). To begin with, the items are clustered based on their attribute dimension, which accelerates the computation of the nearest neighbor set. Subsequently, H-CF enhances the formula for scoring similarity by penalizing popular items and optimizing unpopular items. This improvement enhances the rationality of scoring similarity and reduces the impact of data sparseness. Furthermore, a weighting function is employed to combine the various improved algorithms. The balance factor of the weighting function is dynamically adjusted to attain the optimal recommendation list. To address the real-time and scalability concerns, the algorithm leverages the Spark big data distributed cluster computing framework. Experiments were conducted using the public dataset MovieLens, where the improved algorithm's performance was compared against the algorithm before enhancement and the algorithm running on a single machine. The experimental results demonstrate that the improved algorithm outperforms in terms of data sparsity, recommendation personalization, accuracy, recall, and efficiency.","PeriodicalId":23976,"journal":{"name":"Wuhan University Journal of Natural Sciences","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wuhan University Journal of Natural Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1051/wujns/2023285451","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0

Abstract

An improved Hybrid Collaborative Filtering algorithm (H-CF) is proposed, addressing the issues of data sparsity, low recommendation accuracy, and poor scalability present in traditional collaborative filtering algorithms. The core of H-CF is a linear weighted hybrid algorithm based on the Latent Factor Model (LFM) and the Improved Item Clustering and Similarity Calculation Collaborative Filtering Algorithm (ITCSCF). To begin with, the items are clustered based on their attribute dimension, which accelerates the computation of the nearest neighbor set. Subsequently, H-CF enhances the formula for scoring similarity by penalizing popular items and optimizing unpopular items. This improvement enhances the rationality of scoring similarity and reduces the impact of data sparseness. Furthermore, a weighting function is employed to combine the various improved algorithms. The balance factor of the weighting function is dynamically adjusted to attain the optimal recommendation list. To address the real-time and scalability concerns, the algorithm leverages the Spark big data distributed cluster computing framework. Experiments were conducted using the public dataset MovieLens, where the improved algorithm's performance was compared against the algorithm before enhancement and the algorithm running on a single machine. The experimental results demonstrate that the improved algorithm outperforms in terms of data sparsity, recommendation personalization, accuracy, recall, and efficiency.
基于Spark平台的改进混合协同滤波算法
针对传统协同过滤算法存在的数据稀疏性、推荐精度低、可扩展性差等问题,提出了一种改进的混合协同过滤算法(H-CF)。H-CF的核心是基于潜在因素模型(LFM)和改进的项目聚类与相似度计算协同过滤算法(ITCSCF)的线性加权混合算法。首先,根据属性维度对项目进行聚类,加快了最近邻集的计算速度。随后,H-CF通过惩罚受欢迎的项目和优化不受欢迎的项目来增强相似性评分公式。这种改进增强了相似性评分的合理性,减少了数据稀疏性的影响。此外,采用加权函数对各种改进算法进行组合。动态调整权重函数的平衡因子,得到最优推荐列表。为了解决实时性和可扩展性问题,该算法利用了Spark大数据分布式集群计算框架。使用公共数据集MovieLens进行了实验,将改进后的算法与增强前的算法和在单机上运行的算法进行了比较。实验结果表明,改进后的算法在数据稀疏性、推荐个性化、准确率、召回率和效率等方面都有较好的表现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Wuhan University Journal of Natural Sciences
Wuhan University Journal of Natural Sciences Multidisciplinary-Multidisciplinary
CiteScore
0.40
自引率
0.00%
发文量
2485
期刊介绍: Wuhan University Journal of Natural Sciences aims to promote rapid communication and exchange between the World and Wuhan University, as well as other Chinese universities and academic institutions. It mainly reflects the latest advances being made in many disciplines of scientific research in Chinese universities and academic institutions. The journal also publishes papers presented at conferences in China and abroad. The multi-disciplinary nature of Wuhan University Journal of Natural Sciences is apparent in the wide range of articles from leading Chinese scholars. This journal also aims to introduce Chinese academic achievements to the world community, by demonstrating the significance of Chinese scientific investigations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信