Marlina Abdul Latib, Saiful Adli Ismail, O. Yusop, Pritheega Magalingam, Azri Azmi
{"title":"Analysing Log Files For Web Intrusion Investigation Using Hadoop","authors":"Marlina Abdul Latib, Saiful Adli Ismail, O. Yusop, Pritheega Magalingam, Azri Azmi","doi":"10.1145/3220267.3220269","DOIUrl":null,"url":null,"abstract":"The process of analyzing large amount of data from the log file helps organization to identify the web intruders' activities as well as the vulnerabilities of the website. However, analyzing them is totally a great challenge as the process is time consuming and sometimes can be inefficient. Existing or traditional log analyzers may not able to analyze such big chunk of data. Therefore, the aim of this research is to produce an analysis result for web intrusion investigation in Big Data environment. In this study, web log was analyzed based on attacks that are captured through web server log files. The web log was cleaned and refined through a log-preprocessing program before it was analyzed. An experimental simulation was conducted using Hadoop framework to produce the required analysis results. The results of this experimental simulation indicate that Hadoop application is able to produce analysis results from large size web log files in order to assist the web intrusion investigation. Besides that, the execution time performance analysis shows that the total execution time will not increase linearly with the size of the data. This study also provides solution on visualizing the analysis result using Power View and Hive.","PeriodicalId":177522,"journal":{"name":"International Conference on Software and Information Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Software and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3220267.3220269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
The process of analyzing large amount of data from the log file helps organization to identify the web intruders' activities as well as the vulnerabilities of the website. However, analyzing them is totally a great challenge as the process is time consuming and sometimes can be inefficient. Existing or traditional log analyzers may not able to analyze such big chunk of data. Therefore, the aim of this research is to produce an analysis result for web intrusion investigation in Big Data environment. In this study, web log was analyzed based on attacks that are captured through web server log files. The web log was cleaned and refined through a log-preprocessing program before it was analyzed. An experimental simulation was conducted using Hadoop framework to produce the required analysis results. The results of this experimental simulation indicate that Hadoop application is able to produce analysis results from large size web log files in order to assist the web intrusion investigation. Besides that, the execution time performance analysis shows that the total execution time will not increase linearly with the size of the data. This study also provides solution on visualizing the analysis result using Power View and Hive.