{"title":"基于Apache Spark平台的可扩展模式挖掘方法","authors":"Samaneh Samiei, Mehdi Joodaki, Nasser Ghadiri","doi":"10.1109/ICWR51868.2021.9443111","DOIUrl":null,"url":null,"abstract":"The amount of data is growing sharply on the Internet. Some data like log files are enormous and entail valuable and precious hidden patterns. In other words, a log file is a set of recorded events that carry beneficial and vital information to develop web server performance, stability server loads, control, and rush up user response operations. However, analyzing massive data take a long time and require powerful hardware. Also, the performance of sequential pattern mining methods is usually unsatisfactory to deal with such data. This paper proposes a novel and advanced parallel method for finding the log file patterns, such as frequent patterns (e.g., URL, IP, Status Code), how users accessed files, the number of errors, and the most common errors by applying the Apache Spark platform. Experiment results demonstrate that the proposed method's run time on three datasets is significantly less than its four rival pattern mining methods.","PeriodicalId":377597,"journal":{"name":"2021 7th International Conference on Web Research (ICWR)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Scalable Pattern Mining Method Using Apache Spark Platform\",\"authors\":\"Samaneh Samiei, Mehdi Joodaki, Nasser Ghadiri\",\"doi\":\"10.1109/ICWR51868.2021.9443111\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The amount of data is growing sharply on the Internet. Some data like log files are enormous and entail valuable and precious hidden patterns. In other words, a log file is a set of recorded events that carry beneficial and vital information to develop web server performance, stability server loads, control, and rush up user response operations. However, analyzing massive data take a long time and require powerful hardware. Also, the performance of sequential pattern mining methods is usually unsatisfactory to deal with such data. This paper proposes a novel and advanced parallel method for finding the log file patterns, such as frequent patterns (e.g., URL, IP, Status Code), how users accessed files, the number of errors, and the most common errors by applying the Apache Spark platform. Experiment results demonstrate that the proposed method's run time on three datasets is significantly less than its four rival pattern mining methods.\",\"PeriodicalId\":377597,\"journal\":{\"name\":\"2021 7th International Conference on Web Research (ICWR)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 7th International Conference on Web Research (ICWR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICWR51868.2021.9443111\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR51868.2021.9443111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Scalable Pattern Mining Method Using Apache Spark Platform
The amount of data is growing sharply on the Internet. Some data like log files are enormous and entail valuable and precious hidden patterns. In other words, a log file is a set of recorded events that carry beneficial and vital information to develop web server performance, stability server loads, control, and rush up user response operations. However, analyzing massive data take a long time and require powerful hardware. Also, the performance of sequential pattern mining methods is usually unsatisfactory to deal with such data. This paper proposes a novel and advanced parallel method for finding the log file patterns, such as frequent patterns (e.g., URL, IP, Status Code), how users accessed files, the number of errors, and the most common errors by applying the Apache Spark platform. Experiment results demonstrate that the proposed method's run time on three datasets is significantly less than its four rival pattern mining methods.