S. Puuska, M. Kortelainen, Viljami Venekoski, J. Vankka
{"title":"芬兰网络安全主题下的即时消息分类自由形式讨论","authors":"S. Puuska, M. Kortelainen, Viljami Venekoski, J. Vankka","doi":"10.1109/CyberSA.2016.7503294","DOIUrl":null,"url":null,"abstract":"Instant messaging enables rapid collaboration between professionals during cyber security incidents. However, monitoring discussion manually becomes challenging as the number of communication channels increases. Failure to identify relevant information from the free-form instant messages may lead to reduced situational awareness. In this paper, the problem was approached by developing a framework for classification of instant message topics of cyber security-themed discussion in Finnish. The program utilizes open source software components in morphological analysis, and subsequently converts the messages into Bag-of-Words representations before classifying them into predetermined incident categories. We compared support vector machines (SVM), multinomial naïve Bayes, and complement naïve Bayes (CNB) classification methods with five-fold cross-validation. A combination of SVM and CNB achieved classification accuracy of over 85 %, while multiclass SVM achieved 87 % accuracy. The implemented program recognizes cyber security-related messages in IRC chat rooms and categorizes them accordingly.","PeriodicalId":179031,"journal":{"name":"2016 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (CyberSA)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Instant message classification in Finnish cyber security themed free-form discussion\",\"authors\":\"S. Puuska, M. Kortelainen, Viljami Venekoski, J. Vankka\",\"doi\":\"10.1109/CyberSA.2016.7503294\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Instant messaging enables rapid collaboration between professionals during cyber security incidents. However, monitoring discussion manually becomes challenging as the number of communication channels increases. Failure to identify relevant information from the free-form instant messages may lead to reduced situational awareness. In this paper, the problem was approached by developing a framework for classification of instant message topics of cyber security-themed discussion in Finnish. The program utilizes open source software components in morphological analysis, and subsequently converts the messages into Bag-of-Words representations before classifying them into predetermined incident categories. We compared support vector machines (SVM), multinomial naïve Bayes, and complement naïve Bayes (CNB) classification methods with five-fold cross-validation. A combination of SVM and CNB achieved classification accuracy of over 85 %, while multiclass SVM achieved 87 % accuracy. The implemented program recognizes cyber security-related messages in IRC chat rooms and categorizes them accordingly.\",\"PeriodicalId\":179031,\"journal\":{\"name\":\"2016 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (CyberSA)\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (CyberSA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CyberSA.2016.7503294\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (CyberSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberSA.2016.7503294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Instant message classification in Finnish cyber security themed free-form discussion
Instant messaging enables rapid collaboration between professionals during cyber security incidents. However, monitoring discussion manually becomes challenging as the number of communication channels increases. Failure to identify relevant information from the free-form instant messages may lead to reduced situational awareness. In this paper, the problem was approached by developing a framework for classification of instant message topics of cyber security-themed discussion in Finnish. The program utilizes open source software components in morphological analysis, and subsequently converts the messages into Bag-of-Words representations before classifying them into predetermined incident categories. We compared support vector machines (SVM), multinomial naïve Bayes, and complement naïve Bayes (CNB) classification methods with five-fold cross-validation. A combination of SVM and CNB achieved classification accuracy of over 85 %, while multiclass SVM achieved 87 % accuracy. The implemented program recognizes cyber security-related messages in IRC chat rooms and categorizes them accordingly.