{"title":"利用基于云的Hadoop在MapReduce中实现支持向量机来预测Web用户的浏览行为","authors":"Pradipsinh K. Chavda, J. S. Dhobi","doi":"10.1109/NUICONE.2015.7449648","DOIUrl":null,"url":null,"abstract":"The motivation behind the work is that the prediction of web user's browsing behavior while serving the Internet, reduces the user's browsing access time and avoids the visit of unnecessary pages to ease network traffic. This research work introduces parallel Support Vector Machines for web page prediction. The web contains an enormous amount of data and web data increases exponentially, but the training time for Support vector machine is very large. That is, SVM's suffer from a widely recognized scalability problems in both memory requirements and computation time when the input dataset is too large. To address this, we aimed at training the Support vector machine model in MapReduce programming model of Hadoop framework, since the MapReduce programming model has the ability to rapidly process a large amount of data in parallel. MapReduce works in tandem with Hadoop Distributed File System (HDFS). The so proposed approach will solve the scalability problem of present SVM algorithm. The performance of the proposed approach is evaluated in Amazon cloud EC2 using cloud-based Hadoop. Our experiments show the effectiveness in term of training time and also improve the preprocessing time. We find in our research study that a number of nodes increased the training time of proposed algorithm is decreased. We checked that parallelization of SMO has no more negative effect on the accuracy level, as compared to the standard approach.","PeriodicalId":131332,"journal":{"name":"2015 5th Nirma University International Conference on Engineering (NUiCONE)","volume":"144 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Web users browsing behavior prediction by implementing support vector machines in MapReduce using cloud based Hadoop\",\"authors\":\"Pradipsinh K. Chavda, J. S. Dhobi\",\"doi\":\"10.1109/NUICONE.2015.7449648\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The motivation behind the work is that the prediction of web user's browsing behavior while serving the Internet, reduces the user's browsing access time and avoids the visit of unnecessary pages to ease network traffic. This research work introduces parallel Support Vector Machines for web page prediction. The web contains an enormous amount of data and web data increases exponentially, but the training time for Support vector machine is very large. That is, SVM's suffer from a widely recognized scalability problems in both memory requirements and computation time when the input dataset is too large. To address this, we aimed at training the Support vector machine model in MapReduce programming model of Hadoop framework, since the MapReduce programming model has the ability to rapidly process a large amount of data in parallel. MapReduce works in tandem with Hadoop Distributed File System (HDFS). The so proposed approach will solve the scalability problem of present SVM algorithm. The performance of the proposed approach is evaluated in Amazon cloud EC2 using cloud-based Hadoop. Our experiments show the effectiveness in term of training time and also improve the preprocessing time. We find in our research study that a number of nodes increased the training time of proposed algorithm is decreased. We checked that parallelization of SMO has no more negative effect on the accuracy level, as compared to the standard approach.\",\"PeriodicalId\":131332,\"journal\":{\"name\":\"2015 5th Nirma University International Conference on Engineering (NUiCONE)\",\"volume\":\"144 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 5th Nirma University International Conference on Engineering (NUiCONE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NUICONE.2015.7449648\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 5th Nirma University International Conference on Engineering (NUiCONE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NUICONE.2015.7449648","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Web users browsing behavior prediction by implementing support vector machines in MapReduce using cloud based Hadoop
The motivation behind the work is that the prediction of web user's browsing behavior while serving the Internet, reduces the user's browsing access time and avoids the visit of unnecessary pages to ease network traffic. This research work introduces parallel Support Vector Machines for web page prediction. The web contains an enormous amount of data and web data increases exponentially, but the training time for Support vector machine is very large. That is, SVM's suffer from a widely recognized scalability problems in both memory requirements and computation time when the input dataset is too large. To address this, we aimed at training the Support vector machine model in MapReduce programming model of Hadoop framework, since the MapReduce programming model has the ability to rapidly process a large amount of data in parallel. MapReduce works in tandem with Hadoop Distributed File System (HDFS). The so proposed approach will solve the scalability problem of present SVM algorithm. The performance of the proposed approach is evaluated in Amazon cloud EC2 using cloud-based Hadoop. Our experiments show the effectiveness in term of training time and also improve the preprocessing time. We find in our research study that a number of nodes increased the training time of proposed algorithm is decreased. We checked that parallelization of SMO has no more negative effect on the accuracy level, as compared to the standard approach.