Zhen YOU, Hongwen HU, Yutao WANG, Jinyun XUE, Xinwu YI
{"title":"基于Spark平台的改进混合协同滤波算法","authors":"Zhen YOU, Hongwen HU, Yutao WANG, Jinyun XUE, Xinwu YI","doi":"10.1051/wujns/2023285451","DOIUrl":null,"url":null,"abstract":"An improved Hybrid Collaborative Filtering algorithm (H-CF) is proposed, addressing the issues of data sparsity, low recommendation accuracy, and poor scalability present in traditional collaborative filtering algorithms. The core of H-CF is a linear weighted hybrid algorithm based on the Latent Factor Model (LFM) and the Improved Item Clustering and Similarity Calculation Collaborative Filtering Algorithm (ITCSCF). To begin with, the items are clustered based on their attribute dimension, which accelerates the computation of the nearest neighbor set. Subsequently, H-CF enhances the formula for scoring similarity by penalizing popular items and optimizing unpopular items. This improvement enhances the rationality of scoring similarity and reduces the impact of data sparseness. Furthermore, a weighting function is employed to combine the various improved algorithms. The balance factor of the weighting function is dynamically adjusted to attain the optimal recommendation list. To address the real-time and scalability concerns, the algorithm leverages the Spark big data distributed cluster computing framework. Experiments were conducted using the public dataset MovieLens, where the improved algorithm's performance was compared against the algorithm before enhancement and the algorithm running on a single machine. The experimental results demonstrate that the improved algorithm outperforms in terms of data sparsity, recommendation personalization, accuracy, recall, and efficiency.","PeriodicalId":23976,"journal":{"name":"Wuhan University Journal of Natural Sciences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improved Hybrid Collaborative Fitering Algorithm Based on Spark Platform\",\"authors\":\"Zhen YOU, Hongwen HU, Yutao WANG, Jinyun XUE, Xinwu YI\",\"doi\":\"10.1051/wujns/2023285451\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An improved Hybrid Collaborative Filtering algorithm (H-CF) is proposed, addressing the issues of data sparsity, low recommendation accuracy, and poor scalability present in traditional collaborative filtering algorithms. The core of H-CF is a linear weighted hybrid algorithm based on the Latent Factor Model (LFM) and the Improved Item Clustering and Similarity Calculation Collaborative Filtering Algorithm (ITCSCF). To begin with, the items are clustered based on their attribute dimension, which accelerates the computation of the nearest neighbor set. Subsequently, H-CF enhances the formula for scoring similarity by penalizing popular items and optimizing unpopular items. This improvement enhances the rationality of scoring similarity and reduces the impact of data sparseness. Furthermore, a weighting function is employed to combine the various improved algorithms. The balance factor of the weighting function is dynamically adjusted to attain the optimal recommendation list. To address the real-time and scalability concerns, the algorithm leverages the Spark big data distributed cluster computing framework. Experiments were conducted using the public dataset MovieLens, where the improved algorithm's performance was compared against the algorithm before enhancement and the algorithm running on a single machine. The experimental results demonstrate that the improved algorithm outperforms in terms of data sparsity, recommendation personalization, accuracy, recall, and efficiency.\",\"PeriodicalId\":23976,\"journal\":{\"name\":\"Wuhan University Journal of Natural Sciences\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Wuhan University Journal of Natural Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1051/wujns/2023285451\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wuhan University Journal of Natural Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1051/wujns/2023285451","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Multidisciplinary","Score":null,"Total":0}
Improved Hybrid Collaborative Fitering Algorithm Based on Spark Platform
An improved Hybrid Collaborative Filtering algorithm (H-CF) is proposed, addressing the issues of data sparsity, low recommendation accuracy, and poor scalability present in traditional collaborative filtering algorithms. The core of H-CF is a linear weighted hybrid algorithm based on the Latent Factor Model (LFM) and the Improved Item Clustering and Similarity Calculation Collaborative Filtering Algorithm (ITCSCF). To begin with, the items are clustered based on their attribute dimension, which accelerates the computation of the nearest neighbor set. Subsequently, H-CF enhances the formula for scoring similarity by penalizing popular items and optimizing unpopular items. This improvement enhances the rationality of scoring similarity and reduces the impact of data sparseness. Furthermore, a weighting function is employed to combine the various improved algorithms. The balance factor of the weighting function is dynamically adjusted to attain the optimal recommendation list. To address the real-time and scalability concerns, the algorithm leverages the Spark big data distributed cluster computing framework. Experiments were conducted using the public dataset MovieLens, where the improved algorithm's performance was compared against the algorithm before enhancement and the algorithm running on a single machine. The experimental results demonstrate that the improved algorithm outperforms in terms of data sparsity, recommendation personalization, accuracy, recall, and efficiency.
期刊介绍:
Wuhan University Journal of Natural Sciences aims to promote rapid communication and exchange between the World and Wuhan University, as well as other Chinese universities and academic institutions. It mainly reflects the latest advances being made in many disciplines of scientific research in Chinese universities and academic institutions. The journal also publishes papers presented at conferences in China and abroad. The multi-disciplinary nature of Wuhan University Journal of Natural Sciences is apparent in the wide range of articles from leading Chinese scholars. This journal also aims to introduce Chinese academic achievements to the world community, by demonstrating the significance of Chinese scientific investigations.