{"title":"A parallel clustering algorithm for logs data based on Hadoop platform","authors":"J. Huo, Jia-Yow Weng, Hong Qu","doi":"10.1145/3318265.3318281","DOIUrl":null,"url":null,"abstract":"Log analysis is an important method to reflect the running status and user behavior of the network system, and is also an important way to ensure network security. In view of the fact that the storage or calculation of log data by a single host can not meet the requirements of large-scale data analysis, this paper proposes a clustering method of big data based on Map/Reduce distributed computing framework for Web logs. The experiments are taken on the Hadoop platform. The relations and rules that exist in the logs are examined and analyzed to obtain the potential information. This method can enable efficient storage, management, and mining analysis for the large-scale Web logs.","PeriodicalId":241692,"journal":{"name":"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318265.3318281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Log analysis is an important method to reflect the running status and user behavior of the network system, and is also an important way to ensure network security. In view of the fact that the storage or calculation of log data by a single host can not meet the requirements of large-scale data analysis, this paper proposes a clustering method of big data based on Map/Reduce distributed computing framework for Web logs. The experiments are taken on the Hadoop platform. The relations and rules that exist in the logs are examined and analyzed to obtain the potential information. This method can enable efficient storage, management, and mining analysis for the large-scale Web logs.