An enhanced pre-processing technique for web log mining by removing web robots

P. Nithya, P. Sumathi
{"title":"An enhanced pre-processing technique for web log mining by removing web robots","authors":"P. Nithya, P. Sumathi","doi":"10.1109/ICCIC.2012.6510325","DOIUrl":null,"url":null,"abstract":"Nowadays, internet becomes useful source of information in day-to-day life. It creates huge development of World Wide Web in its quantity of interchange and its size and difficulty of websites. Web Usage Mining (WUM) is one of the main applications of data mining, artificial intelligence and so on to the web data and forecast the user's visiting behaviors and obtains their interests by investigating the samples. Since WUM directly involves in large range of applications, such as, ecommerce, e-learning, Web analytics, information retrieval etc. Weblog data is one of the major sources which contain all the information regarding the users visited links, browsing patterns, time spent on a particular page or link and this information can be used in several applications like adaptive web sites, modified services, customer summary, pre-fetching, generate attractive web sites etc. There are several problems related with the existing web usage mining approaches. Existing web usage mining algorithms suffer from difficulty of practical applicability. So, a novel research is necessary for the accurate prediction of future performance of web users with rapid execution time. WUM consists of preprocessing, pattern discovery and pattern analysis. Log data is characteristically noisy and unclear. Hence, preprocessing is an essential process for effective mining process. In this paper, a novel pre-processing technique is proposed by removing local and global noise and web robots. Anonymous Microsoft Web Dataset and MSNBC.com Anonymous Web Dataset are used for estimating the proposed preprocessing technique.","PeriodicalId":340238,"journal":{"name":"2012 IEEE International Conference on Computational Intelligence and Computing Research","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Computational Intelligence and Computing Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIC.2012.6510325","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Nowadays, internet becomes useful source of information in day-to-day life. It creates huge development of World Wide Web in its quantity of interchange and its size and difficulty of websites. Web Usage Mining (WUM) is one of the main applications of data mining, artificial intelligence and so on to the web data and forecast the user's visiting behaviors and obtains their interests by investigating the samples. Since WUM directly involves in large range of applications, such as, ecommerce, e-learning, Web analytics, information retrieval etc. Weblog data is one of the major sources which contain all the information regarding the users visited links, browsing patterns, time spent on a particular page or link and this information can be used in several applications like adaptive web sites, modified services, customer summary, pre-fetching, generate attractive web sites etc. There are several problems related with the existing web usage mining approaches. Existing web usage mining algorithms suffer from difficulty of practical applicability. So, a novel research is necessary for the accurate prediction of future performance of web users with rapid execution time. WUM consists of preprocessing, pattern discovery and pattern analysis. Log data is characteristically noisy and unclear. Hence, preprocessing is an essential process for effective mining process. In this paper, a novel pre-processing technique is proposed by removing local and global noise and web robots. Anonymous Microsoft Web Dataset and MSNBC.com Anonymous Web Dataset are used for estimating the proposed preprocessing technique.
一种去除网络机器人的增强web日志挖掘预处理技术
如今,互联网成为日常生活中有用的信息来源。它使万维网的交换量、网站的规模和难度都得到了巨大的发展。Web Usage Mining (Web Usage Mining, WUM)是数据挖掘、人工智能等技术对Web数据的主要应用之一,通过对样本的调查来预测用户的访问行为并获取用户的兴趣。由于WUM直接涉及广泛的应用,如电子商务,电子学习,网络分析,信息检索等。Weblog数据是包含用户访问链接、浏览模式、在特定页面或链接上花费的时间等所有信息的主要来源之一,这些信息可以用于几个应用程序,如自适应网站、修改服务、客户摘要、预获取、生成有吸引力的网站等。现有的web使用挖掘方法存在一些问题。现有的web使用率挖掘算法存在难以实际应用的问题。因此,有必要对快速执行时间下的网络用户的未来表现进行准确预测。WUM包括预处理、模式发现和模式分析。日志数据的特点是嘈杂和不清晰。因此,预处理是有效挖掘过程的必要环节。本文提出了一种去除局部和全局噪声和网络机器人的预处理方法。使用匿名微软网络数据集和MSNBC.com匿名网络数据集对所提出的预处理技术进行了估计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信