{"title":"基于日志分析的运行时系统问题识别方法","authors":"Yanfang Liu, J. Lv, Shilong Ma, Wentao Yao","doi":"10.1109/ICCCN.2018.8487466","DOIUrl":null,"url":null,"abstract":"Currently system logs are an important source of information for system administrators to monitor system behaviors and to identify system problems. The manual examining is infeasible for the complex system and the existing automated methods for identifying system problems have different disadvantages such as the extreme dependency on the source code of the system, the low accuracy of predicting or identifying the system problems, or the requirement of the balanced and labeled training data set. This paper proposes a one- class Support Vector Machine (OCSVM) based method to identify the runtime system problems. Firstly, log sequences are generated for describing the running trajectories of the monitored system by parsing log messages; Secondly, variable length n-gram features are extracted, and moreover, the log sequences are represented as feature vectors based on these variable length n-gram features and Vector Space Model (VSM). Finally, all the feature vectors of the training log sequence set, which only includes the labeled normal log sequences, are input into OCSVM. Experimental results show that it performs better to use linear kernel to train OCSVM on our feature vectors than Gaussian kernel and the size of the sliding window hardly affects the performance of our method. Moreover, the proposed method achieves better performance on unbalanced training dataset than the method based on Random Indexing (RI) and weighted SVM.","PeriodicalId":399145,"journal":{"name":"2018 27th International Conference on Computer Communication and Networks (ICCCN)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"The Runtime System Problem Identification Method Based on Log Analysis\",\"authors\":\"Yanfang Liu, J. Lv, Shilong Ma, Wentao Yao\",\"doi\":\"10.1109/ICCCN.2018.8487466\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently system logs are an important source of information for system administrators to monitor system behaviors and to identify system problems. The manual examining is infeasible for the complex system and the existing automated methods for identifying system problems have different disadvantages such as the extreme dependency on the source code of the system, the low accuracy of predicting or identifying the system problems, or the requirement of the balanced and labeled training data set. This paper proposes a one- class Support Vector Machine (OCSVM) based method to identify the runtime system problems. Firstly, log sequences are generated for describing the running trajectories of the monitored system by parsing log messages; Secondly, variable length n-gram features are extracted, and moreover, the log sequences are represented as feature vectors based on these variable length n-gram features and Vector Space Model (VSM). Finally, all the feature vectors of the training log sequence set, which only includes the labeled normal log sequences, are input into OCSVM. Experimental results show that it performs better to use linear kernel to train OCSVM on our feature vectors than Gaussian kernel and the size of the sliding window hardly affects the performance of our method. Moreover, the proposed method achieves better performance on unbalanced training dataset than the method based on Random Indexing (RI) and weighted SVM.\",\"PeriodicalId\":399145,\"journal\":{\"name\":\"2018 27th International Conference on Computer Communication and Networks (ICCCN)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 27th International Conference on Computer Communication and Networks (ICCCN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCN.2018.8487466\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 27th International Conference on Computer Communication and Networks (ICCCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCN.2018.8487466","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Runtime System Problem Identification Method Based on Log Analysis
Currently system logs are an important source of information for system administrators to monitor system behaviors and to identify system problems. The manual examining is infeasible for the complex system and the existing automated methods for identifying system problems have different disadvantages such as the extreme dependency on the source code of the system, the low accuracy of predicting or identifying the system problems, or the requirement of the balanced and labeled training data set. This paper proposes a one- class Support Vector Machine (OCSVM) based method to identify the runtime system problems. Firstly, log sequences are generated for describing the running trajectories of the monitored system by parsing log messages; Secondly, variable length n-gram features are extracted, and moreover, the log sequences are represented as feature vectors based on these variable length n-gram features and Vector Space Model (VSM). Finally, all the feature vectors of the training log sequence set, which only includes the labeled normal log sequences, are input into OCSVM. Experimental results show that it performs better to use linear kernel to train OCSVM on our feature vectors than Gaussian kernel and the size of the sliding window hardly affects the performance of our method. Moreover, the proposed method achieves better performance on unbalanced training dataset than the method based on Random Indexing (RI) and weighted SVM.