{"title":"Automatic classification of security messages based on text categorization","authors":"F. Benali, S. Ubéda, V. Legrand","doi":"10.1145/1416729.1416741","DOIUrl":null,"url":null,"abstract":"The generated messages by the security devices are the necessary data for the detection of the malicious activities in an information system. The heterogeneity of the devices and the lack of a standard for the security messages make the automatic processing of the messages difficult. The messages are short, use a very wide vocabulary and have different formats. We propose in this article the application of the text categorization technics for the automatic classification of security log files messages, in categories defined by an ontology. We develop an extraction module for the message attributes to reduce the vocabulary size. Then we apply two training algorithms: the k-nearest neighbour algorithm and the naive bayes, on two corpus of security log messages.","PeriodicalId":321308,"journal":{"name":"NOuvelles TEchnologies de la REpartition","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NOuvelles TEchnologies de la REpartition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1416729.1416741","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The generated messages by the security devices are the necessary data for the detection of the malicious activities in an information system. The heterogeneity of the devices and the lack of a standard for the security messages make the automatic processing of the messages difficult. The messages are short, use a very wide vocabulary and have different formats. We propose in this article the application of the text categorization technics for the automatic classification of security log files messages, in categories defined by an ontology. We develop an extraction module for the message attributes to reduce the vocabulary size. Then we apply two training algorithms: the k-nearest neighbour algorithm and the naive bayes, on two corpus of security log messages.