Improving Multiclass Classification of Cybersecurity Breaches in Railway Infrastructure using Imbalanced Learning

Aleksandr N. Nebaba, I. Savvas, M. Butakova, A. Chernov, Petr S. Shevchuk
{"title":"Improving Multiclass Classification of Cybersecurity Breaches in Railway Infrastructure using Imbalanced Learning","authors":"Aleksandr N. Nebaba, I. Savvas, M. Butakova, A. Chernov, Petr S. Shevchuk","doi":"10.1145/3501774.3501789","DOIUrl":null,"url":null,"abstract":"Machine learning approaches and algorithms are spreading in wide areas in research and technology. Cybersecurity breaches are the common anomalies for networked and distributed infrastructures which are monitored, registered, and described carefully. However, the description of each security breaches episode and its classification is still a difficult problem, especially in highly complex telecommunication infrastructure. Railway information infrastructure usually has a large scale and large diversity of possible security breaches. Today's situation shows the registering of the security breaches has a mature and stable character, but the problem of their automated classification is not solved completely. Many studies on security breaches multiclass classification show inadequate accuracy of classification. We investigated the origins of this problem and suggested the possible roots consist in disbalance the datasets used for machine learning multiclass classification. Thus, we proposed an approach to improve the accuracy of the classification and verified our approach on the really collected datasets with cybersecurity breaches in railway telecommunication infrastructure. We analyzed the results of applying three imbalanced learning methodologies, namely random oversampling, synthetic minority oversampling technique, and the last one with Tomek links. We have implemented three machine learning algorithms, namely Naïve Bayes, K-means, and support vector machine, on disbalances and balanced data to estimate imbalance learning methodologies with comparing results. The proposed approach demonstrated the increase of the accuracy for multiclass classification in the range from 30 to 41%, depending on the imbalanced learning technique.","PeriodicalId":255059,"journal":{"name":"Proceedings of the 2021 European Symposium on Software Engineering","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 European Symposium on Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3501774.3501789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning approaches and algorithms are spreading in wide areas in research and technology. Cybersecurity breaches are the common anomalies for networked and distributed infrastructures which are monitored, registered, and described carefully. However, the description of each security breaches episode and its classification is still a difficult problem, especially in highly complex telecommunication infrastructure. Railway information infrastructure usually has a large scale and large diversity of possible security breaches. Today's situation shows the registering of the security breaches has a mature and stable character, but the problem of their automated classification is not solved completely. Many studies on security breaches multiclass classification show inadequate accuracy of classification. We investigated the origins of this problem and suggested the possible roots consist in disbalance the datasets used for machine learning multiclass classification. Thus, we proposed an approach to improve the accuracy of the classification and verified our approach on the really collected datasets with cybersecurity breaches in railway telecommunication infrastructure. We analyzed the results of applying three imbalanced learning methodologies, namely random oversampling, synthetic minority oversampling technique, and the last one with Tomek links. We have implemented three machine learning algorithms, namely Naïve Bayes, K-means, and support vector machine, on disbalances and balanced data to estimate imbalance learning methodologies with comparing results. The proposed approach demonstrated the increase of the accuracy for multiclass classification in the range from 30 to 41%, depending on the imbalanced learning technique.
利用不平衡学习改进铁路基础设施网络安全漏洞多类分类
机器学习方法和算法正在广泛的研究和技术领域中传播。网络安全漏洞是网络和分布式基础设施的常见异常,需要仔细监控、注册和描述。然而,安全漏洞事件的描述及其分类仍然是一个难题,特别是在高度复杂的电信基础设施中。铁路信息基础设施通常具有规模大、种类多的安全漏洞。目前的情况表明,安全漏洞的登记具有成熟和稳定的特点,但其自动分类的问题并没有完全解决。许多关于安全漏洞多类分类的研究表明,分类的准确性不足。我们研究了这个问题的根源,并提出可能的根源在于用于机器学习多类分类的数据集的不平衡。因此,我们提出了一种提高分类准确性的方法,并在铁路电信基础设施中具有网络安全漏洞的真实收集数据集上验证了我们的方法。我们分析了三种不平衡学习方法的结果,即随机过采样技术、合成少数过采样技术和最后一种带有Tomek链接的不平衡学习方法。我们在失衡和平衡数据上实现了三种机器学习算法,即Naïve贝叶斯、K-means和支持向量机,通过比较结果来估计失衡学习方法。根据不平衡学习技术的不同,所提出的方法可以将多类分类的准确率提高30%到41%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信