利用不平衡学习改进铁路基础设施网络安全漏洞多类分类

Proceedings of the 2021 European Symposium on Software Engineering Pub Date : 2021-11-19 DOI:10.1145/3501774.3501789

Aleksandr N. Nebaba, I. Savvas, M. Butakova, A. Chernov, Petr S. Shevchuk

{"title":"利用不平衡学习改进铁路基础设施网络安全漏洞多类分类","authors":"Aleksandr N. Nebaba, I. Savvas, M. Butakova, A. Chernov, Petr S. Shevchuk","doi":"10.1145/3501774.3501789","DOIUrl":null,"url":null,"abstract":"Machine learning approaches and algorithms are spreading in wide areas in research and technology. Cybersecurity breaches are the common anomalies for networked and distributed infrastructures which are monitored, registered, and described carefully. However, the description of each security breaches episode and its classification is still a difficult problem, especially in highly complex telecommunication infrastructure. Railway information infrastructure usually has a large scale and large diversity of possible security breaches. Today's situation shows the registering of the security breaches has a mature and stable character, but the problem of their automated classification is not solved completely. Many studies on security breaches multiclass classification show inadequate accuracy of classification. We investigated the origins of this problem and suggested the possible roots consist in disbalance the datasets used for machine learning multiclass classification. Thus, we proposed an approach to improve the accuracy of the classification and verified our approach on the really collected datasets with cybersecurity breaches in railway telecommunication infrastructure. We analyzed the results of applying three imbalanced learning methodologies, namely random oversampling, synthetic minority oversampling technique, and the last one with Tomek links. We have implemented three machine learning algorithms, namely Naïve Bayes, K-means, and support vector machine, on disbalances and balanced data to estimate imbalance learning methodologies with comparing results. The proposed approach demonstrated the increase of the accuracy for multiclass classification in the range from 30 to 41%, depending on the imbalanced learning technique.","PeriodicalId":255059,"journal":{"name":"Proceedings of the 2021 European Symposium on Software Engineering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Multiclass Classification of Cybersecurity Breaches in Railway Infrastructure using Imbalanced Learning\",\"authors\":\"Aleksandr N. Nebaba, I. Savvas, M. Butakova, A. Chernov, Petr S. Shevchuk\",\"doi\":\"10.1145/3501774.3501789\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning approaches and algorithms are spreading in wide areas in research and technology. Cybersecurity breaches are the common anomalies for networked and distributed infrastructures which are monitored, registered, and described carefully. However, the description of each security breaches episode and its classification is still a difficult problem, especially in highly complex telecommunication infrastructure. Railway information infrastructure usually has a large scale and large diversity of possible security breaches. Today's situation shows the registering of the security breaches has a mature and stable character, but the problem of their automated classification is not solved completely. Many studies on security breaches multiclass classification show inadequate accuracy of classification. We investigated the origins of this problem and suggested the possible roots consist in disbalance the datasets used for machine learning multiclass classification. Thus, we proposed an approach to improve the accuracy of the classification and verified our approach on the really collected datasets with cybersecurity breaches in railway telecommunication infrastructure. We analyzed the results of applying three imbalanced learning methodologies, namely random oversampling, synthetic minority oversampling technique, and the last one with Tomek links. We have implemented three machine learning algorithms, namely Naïve Bayes, K-means, and support vector machine, on disbalances and balanced data to estimate imbalance learning methodologies with comparing results. The proposed approach demonstrated the increase of the accuracy for multiclass classification in the range from 30 to 41%, depending on the imbalanced learning technique.\",\"PeriodicalId\":255059,\"journal\":{\"name\":\"Proceedings of the 2021 European Symposium on Software Engineering\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 European Symposium on Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3501774.3501789\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 European Symposium on Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3501774.3501789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

机器学习方法和算法正在广泛的研究和技术领域中传播。网络安全漏洞是网络和分布式基础设施的常见异常，需要仔细监控、注册和描述。然而，安全漏洞事件的描述及其分类仍然是一个难题，特别是在高度复杂的电信基础设施中。铁路信息基础设施通常具有规模大、种类多的安全漏洞。目前的情况表明，安全漏洞的登记具有成熟和稳定的特点，但其自动分类的问题并没有完全解决。许多关于安全漏洞多类分类的研究表明，分类的准确性不足。我们研究了这个问题的根源，并提出可能的根源在于用于机器学习多类分类的数据集的不平衡。因此，我们提出了一种提高分类准确性的方法，并在铁路电信基础设施中具有网络安全漏洞的真实收集数据集上验证了我们的方法。我们分析了三种不平衡学习方法的结果，即随机过采样技术、合成少数过采样技术和最后一种带有Tomek链接的不平衡学习方法。我们在失衡和平衡数据上实现了三种机器学习算法，即Naïve贝叶斯、K-means和支持向量机，通过比较结果来估计失衡学习方法。根据不平衡学习技术的不同，所提出的方法可以将多类分类的准确率提高30%到41%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving Multiclass Classification of Cybersecurity Breaches in Railway Infrastructure using Imbalanced Learning

Machine learning approaches and algorithms are spreading in wide areas in research and technology. Cybersecurity breaches are the common anomalies for networked and distributed infrastructures which are monitored, registered, and described carefully. However, the description of each security breaches episode and its classification is still a difficult problem, especially in highly complex telecommunication infrastructure. Railway information infrastructure usually has a large scale and large diversity of possible security breaches. Today's situation shows the registering of the security breaches has a mature and stable character, but the problem of their automated classification is not solved completely. Many studies on security breaches multiclass classification show inadequate accuracy of classification. We investigated the origins of this problem and suggested the possible roots consist in disbalance the datasets used for machine learning multiclass classification. Thus, we proposed an approach to improve the accuracy of the classification and verified our approach on the really collected datasets with cybersecurity breaches in railway telecommunication infrastructure. We analyzed the results of applying three imbalanced learning methodologies, namely random oversampling, synthetic minority oversampling technique, and the last one with Tomek links. We have implemented three machine learning algorithms, namely Naïve Bayes, K-means, and support vector machine, on disbalances and balanced data to estimate imbalance learning methodologies with comparing results. The proposed approach demonstrated the increase of the accuracy for multiclass classification in the range from 30 to 41%, depending on the imbalanced learning technique.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2021 European Symposium on Software Engineering

自引率

0.00%

发文量