{"title":"基于Hadoop平台的日志数据并行聚类算法","authors":"J. Huo, Jia-Yow Weng, Hong Qu","doi":"10.1145/3318265.3318281","DOIUrl":null,"url":null,"abstract":"Log analysis is an important method to reflect the running status and user behavior of the network system, and is also an important way to ensure network security. In view of the fact that the storage or calculation of log data by a single host can not meet the requirements of large-scale data analysis, this paper proposes a clustering method of big data based on Map/Reduce distributed computing framework for Web logs. The experiments are taken on the Hadoop platform. The relations and rules that exist in the logs are examined and analyzed to obtain the potential information. This method can enable efficient storage, management, and mining analysis for the large-scale Web logs.","PeriodicalId":241692,"journal":{"name":"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A parallel clustering algorithm for logs data based on Hadoop platform\",\"authors\":\"J. Huo, Jia-Yow Weng, Hong Qu\",\"doi\":\"10.1145/3318265.3318281\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Log analysis is an important method to reflect the running status and user behavior of the network system, and is also an important way to ensure network security. In view of the fact that the storage or calculation of log data by a single host can not meet the requirements of large-scale data analysis, this paper proposes a clustering method of big data based on Map/Reduce distributed computing framework for Web logs. The experiments are taken on the Hadoop platform. The relations and rules that exist in the logs are examined and analyzed to obtain the potential information. This method can enable efficient storage, management, and mining analysis for the large-scale Web logs.\",\"PeriodicalId\":241692,\"journal\":{\"name\":\"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications\",\"volume\":\"146 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3318265.3318281\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318265.3318281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A parallel clustering algorithm for logs data based on Hadoop platform
Log analysis is an important method to reflect the running status and user behavior of the network system, and is also an important way to ensure network security. In view of the fact that the storage or calculation of log data by a single host can not meet the requirements of large-scale data analysis, this paper proposes a clustering method of big data based on Map/Reduce distributed computing framework for Web logs. The experiments are taken on the Hadoop platform. The relations and rules that exist in the logs are examined and analyzed to obtain the potential information. This method can enable efficient storage, management, and mining analysis for the large-scale Web logs.