Research on Collaborative Filtering Recommendation Algorithm Based on Mahout

DEStech Transactions on Environment, Energy and Earth Sciences Pub Date : 2020-04-01 DOI:10.12783/dteees/peems2019/34001

Hui Cao, Liyang Yan

{"title":"Research on Collaborative Filtering Recommendation Algorithm Based on Mahout","authors":"Hui Cao, Liyang Yan","doi":"10.12783/dteees/peems2019/34001","DOIUrl":null,"url":null,"abstract":"This paper studies the recommended algorithm for the Mahout machine learning platform. The principle analysis of the current mainstream recommendation algorithm is based on project-based collaborative filtering recommendation. The recommendation algorithm of Book-Crossing data set is implemented by using the collaborative filtering algorithm provided by Mahout. The similarity distance and other parameters in the general recommendation algorithm are used to compare and analyze the recommended results. Introduction With the rapid development of Internet technology, and with the progress of intelligent terminal equipment, mobile Internet has risen. It makes more convenient for people to publish and share information, at the same time, it brings a lot of data information to people. When we enjoy the convenience brought to us by the information age, we also produce various kinds of information. The application scenario of traditional search engine is that users can clearly know their needs through keywords and words to search. However, when users cannot express their needs or have no clear and effective search content, recommendation system emerges as an emerging technology to make up for the shortcomings of traditional information search engines. It uses different recommendation algorithms to model the user's preferences, and predicts the items or information that the user may be interested in according to the model to recommend a user. Project-based collaborative filtering recommendation algorithm is one of the most widely used and effective recommendation algorithms. Gradually, recommendation systems have become the main functions of IT companies that rely on information and data, such as Taobao, Today's headlines and NetEase Cloud Music. The development of recommendation algorithms has developed rapidly from collaborative filtering algorithm to implicit semantic model, and then to deep learning model. The goal of recommendation system is to predict users' preferences through accurate calculation, to achieve the best recommendation effect by coordinating algorithms, system functions and user experience, and to enhance consumers' user experience with more intelligence and humanity. By analyzing the Mahout Recommendation algorithm and taking the book recommendation system as an example, the results of the recommendation algorithm under different parameters are analyzed and compared. Recommendation Based on Collaborative Filtering The recommendation algorithm based on collaborative filtering is one of the most mature algorithms in the recommendation system. The core idea of recommendation based on collaborative filtering: using user behavior data information to extract features from users, which finds new user-to-item correlations by calculating user-to-item correlation to recommend for current users. Mainstream collaborative filtering algorithms include user-based filtering recommendation (User-Based CF) and Project-based Collaborative Filtering Recommendation (Item-Based CF) algorithm. These two collaborative filtering algorithms will be introduced below. User-Based CF: Recommend to User A items that are of interest to User B and which User A has not browsed. When user A is recommended by the system, user item set B, which is similar to A preference, is found by calculating user history information, and the items that user A has not purchased in item set B are recommended to A. The algorithm is divided into two steps: first, only user B with similar preference to A is found, and recommendation A is not purchased from item B. Item-Based CF: Recommend to User A the similar item B of the item A bought before. It does not calculate the similarity between items according to their content attribute characteristics. It calculates the similarity between the items to be recommended based on the user's historical information. If most people like item A and item B are the same, then item A and item B are similar. We will recommend item B to someone who likes item A but does not choose item B. The algorithm is divided into two steps: first, the similarity between items is calculated and a recommendation list is generated for users according to the similarity between users and items.","PeriodicalId":11324,"journal":{"name":"DEStech Transactions on Environment, Energy and Earth Sciences","volume":"462 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DEStech Transactions on Environment, Energy and Earth Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12783/dteees/peems2019/34001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

This paper studies the recommended algorithm for the Mahout machine learning platform. The principle analysis of the current mainstream recommendation algorithm is based on project-based collaborative filtering recommendation. The recommendation algorithm of Book-Crossing data set is implemented by using the collaborative filtering algorithm provided by Mahout. The similarity distance and other parameters in the general recommendation algorithm are used to compare and analyze the recommended results. Introduction With the rapid development of Internet technology, and with the progress of intelligent terminal equipment, mobile Internet has risen. It makes more convenient for people to publish and share information, at the same time, it brings a lot of data information to people. When we enjoy the convenience brought to us by the information age, we also produce various kinds of information. The application scenario of traditional search engine is that users can clearly know their needs through keywords and words to search. However, when users cannot express their needs or have no clear and effective search content, recommendation system emerges as an emerging technology to make up for the shortcomings of traditional information search engines. It uses different recommendation algorithms to model the user's preferences, and predicts the items or information that the user may be interested in according to the model to recommend a user. Project-based collaborative filtering recommendation algorithm is one of the most widely used and effective recommendation algorithms. Gradually, recommendation systems have become the main functions of IT companies that rely on information and data, such as Taobao, Today's headlines and NetEase Cloud Music. The development of recommendation algorithms has developed rapidly from collaborative filtering algorithm to implicit semantic model, and then to deep learning model. The goal of recommendation system is to predict users' preferences through accurate calculation, to achieve the best recommendation effect by coordinating algorithms, system functions and user experience, and to enhance consumers' user experience with more intelligence and humanity. By analyzing the Mahout Recommendation algorithm and taking the book recommendation system as an example, the results of the recommendation algorithm under different parameters are analyzed and compared. Recommendation Based on Collaborative Filtering The recommendation algorithm based on collaborative filtering is one of the most mature algorithms in the recommendation system. The core idea of recommendation based on collaborative filtering: using user behavior data information to extract features from users, which finds new user-to-item correlations by calculating user-to-item correlation to recommend for current users. Mainstream collaborative filtering algorithms include user-based filtering recommendation (User-Based CF) and Project-based Collaborative Filtering Recommendation (Item-Based CF) algorithm. These two collaborative filtering algorithms will be introduced below. User-Based CF: Recommend to User A items that are of interest to User B and which User A has not browsed. When user A is recommended by the system, user item set B, which is similar to A preference, is found by calculating user history information, and the items that user A has not purchased in item set B are recommended to A. The algorithm is divided into two steps: first, only user B with similar preference to A is found, and recommendation A is not purchased from item B. Item-Based CF: Recommend to User A the similar item B of the item A bought before. It does not calculate the similarity between items according to their content attribute characteristics. It calculates the similarity between the items to be recommended based on the user's historical information. If most people like item A and item B are the same, then item A and item B are similar. We will recommend item B to someone who likes item A but does not choose item B. The algorithm is divided into two steps: first, the similarity between items is calculated and a recommendation list is generated for users according to the similarity between users and items.

查看原文本刊更多论文

基于Mahout的协同过滤推荐算法研究

本文研究了Mahout机器学习平台的推荐算法。目前主流推荐算法的原理分析都是基于基于项目的协同过滤推荐。采用Mahout提供的协同过滤算法实现了Book-Crossing数据集的推荐算法。使用通用推荐算法中的相似距离等参数对推荐结果进行比较和分析。随着互联网技术的飞速发展，以及智能终端设备的进步，移动互联网已经兴起。它为人们发布和共享信息提供了便利，同时也给人们带来了大量的数据信息。我们在享受信息时代给我们带来的便利的同时，也产生了各种各样的信息。传统搜索引擎的应用场景是用户可以通过关键字和词语清楚地知道自己的需求。然而，当用户无法表达自己的需求或没有清晰有效的搜索内容时，推荐系统作为一种新兴技术应运而生，弥补了传统信息搜索引擎的不足。它使用不同的推荐算法对用户的偏好进行建模，并根据模型预测用户可能感兴趣的项目或信息来推荐用户。基于项目的协同过滤推荐算法是目前应用最广泛、最有效的推荐算法之一。推荐系统逐渐成为淘宝、今日头条、网易云音乐等依赖信息和数据的IT公司的主要功能。推荐算法发展迅速，从协同过滤算法到隐式语义模型，再到深度学习模型。推荐系统的目标是通过精确的计算预测用户的偏好，通过算法、系统功能和用户体验的协调，达到最佳的推荐效果，以更加智能和人性化的方式提升消费者的用户体验。通过对Mahout推荐算法的分析，并以图书推荐系统为例，分析比较了不同参数下推荐算法的结果。基于协同过滤的推荐算法是推荐系统中最成熟的算法之一。基于协同过滤的推荐核心思想:利用用户行为数据信息提取用户特征，通过计算用户与物品的相关性，发现新的用户与物品的相关性，向当前用户推荐。主流的协同过滤算法包括user-based filtering recommendation (user-based CF)和Project-based collaborative filtering recommendation (Item-Based CF)算法。下面将介绍这两种协同过滤算法。基于用户的CF:向用户A推荐用户B感兴趣但用户A没有浏览过的项目。当用户的推荐系统,用户项目集B,这是类似于一个偏好,通过计算发现用户历史信息,和用户没有购买的物品推荐项目集B是A算法分为两个步骤:首先,只有找到用户B具有相似偏好,建议不购买的项目B的基于CF:推荐给用户的类似项目B项之前买的。它不会根据项目的内容属性特征来计算它们之间的相似性。它根据用户的历史信息计算要推荐的项目之间的相似度。如果大多数人喜欢A项和B项是相同的，那么A项和B项是相似的。我们会将B项商品推荐给喜欢A项商品但不选择B项商品的人。算法分为两步:首先，计算商品之间的相似度，根据用户与商品之间的相似度为用户生成推荐列表。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

DEStech Transactions on Environment, Energy and Earth Sciences

自引率

0.00%

发文量