Integrating Genetic Algorithm with Random Forest for Improving the Classification Performance of Web Log Data

R. Mittal, Varun Malik, Vikram Singh, Jaiteg Singh, Amandeep Kaur
{"title":"Integrating Genetic Algorithm with Random Forest for Improving the Classification Performance of Web Log Data","authors":"R. Mittal, Varun Malik, Vikram Singh, Jaiteg Singh, Amandeep Kaur","doi":"10.1109/PDGC50313.2020.9315807","DOIUrl":null,"url":null,"abstract":"Web mining is an important approach to retrieve and analyse the information from web server log data. In the internet-driven information age, a lot of data is present on the web in many ways and analysing such data using the web mining methods cam result in some novel insights. Such data can be extracted from the server log files and can be preprocessed to be used for various web mining functionalities. In this paper authors used the data from web server log files, preprocessed it and then applied various classification algorithms such as Naïve bayes,KNN,decision tree,random forest and analysed the results. The best approach was then chosen to further improve the performance of the classifier by integrating it with genetic algorithm. In this context, a hybrid approach, namely RFGA was used integrating Random forest and genetic algorithm on the dataset and the results of different machine learning classifiers were compared with RFGA in terms of the predictive accuracy.","PeriodicalId":347216,"journal":{"name":"2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)","volume":"47 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC50313.2020.9315807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Web mining is an important approach to retrieve and analyse the information from web server log data. In the internet-driven information age, a lot of data is present on the web in many ways and analysing such data using the web mining methods cam result in some novel insights. Such data can be extracted from the server log files and can be preprocessed to be used for various web mining functionalities. In this paper authors used the data from web server log files, preprocessed it and then applied various classification algorithms such as Naïve bayes,KNN,decision tree,random forest and analysed the results. The best approach was then chosen to further improve the performance of the classifier by integrating it with genetic algorithm. In this context, a hybrid approach, namely RFGA was used integrating Random forest and genetic algorithm on the dataset and the results of different machine learning classifiers were compared with RFGA in terms of the predictive accuracy.
结合遗传算法和随机森林提高Web日志数据分类性能
Web挖掘是从Web服务器日志数据中检索和分析信息的一种重要方法。在互联网驱动的信息时代,大量数据以多种方式存在于网络上,使用网络挖掘方法对这些数据进行分析可以产生一些新颖的见解。这些数据可以从服务器日志文件中提取出来,并可以进行预处理,用于各种web挖掘功能。本文利用web服务器日志文件中的数据,对其进行预处理,然后应用Naïve贝叶斯、KNN、决策树、随机森林等多种分类算法,并对结果进行分析。然后选择最佳方法,将其与遗传算法相结合,进一步提高分类器的性能。在此背景下,采用一种混合方法,即RFGA,在数据集上集成随机森林和遗传算法,并将不同机器学习分类器的结果与RFGA进行预测精度的比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信