Scalable and Fast Hierarchical Clustering of IoT Malware Using Active Data Selection

Tianxiang He, Chansu Han, Takeshi Takahashi, S. Kijima, Jun’ichi Takeuchi
{"title":"Scalable and Fast Hierarchical Clustering of IoT Malware Using Active Data Selection","authors":"Tianxiang He, Chansu Han, Takeshi Takahashi, S. Kijima, Jun’ichi Takeuchi","doi":"10.1109/FMEC54266.2021.9732550","DOIUrl":null,"url":null,"abstract":"The number of IoT malware specimens has in-creased rapidly and diversified in recent years. To efficiently analyze a large number of malware specimens, we aim to reduce the calculation cost by clustering specimens with an incomplete distance matrix. Towards this goal, we applied the active clustering algorithm. In this algorithm, Mean-Field An-nealing (MFA) is used to determine the best clustering and the expected value of information criterion to actively choose which pair of specimens to observe its distance. We evaluated the active clustering algorithm with 3,008 mal ware specimens. By applying the active clustering algorithm, we only need to calculate 2.6 % of the whole distance matrix. The active clustering algorithm achieved 86.9% of family name accuracy and 96.5% of architecture name accuracy. Furthermore, the active clustering algorithm achieved the same level of accuracy as our former clustering algorithm with only 2.6 % observation, while our former algorithm needs to observe 7.2 % of the distance matrix. The observation reduction rate is 64 %.","PeriodicalId":217996,"journal":{"name":"2021 Sixth International Conference on Fog and Mobile Edge Computing (FMEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Sixth International Conference on Fog and Mobile Edge Computing (FMEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FMEC54266.2021.9732550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The number of IoT malware specimens has in-creased rapidly and diversified in recent years. To efficiently analyze a large number of malware specimens, we aim to reduce the calculation cost by clustering specimens with an incomplete distance matrix. Towards this goal, we applied the active clustering algorithm. In this algorithm, Mean-Field An-nealing (MFA) is used to determine the best clustering and the expected value of information criterion to actively choose which pair of specimens to observe its distance. We evaluated the active clustering algorithm with 3,008 mal ware specimens. By applying the active clustering algorithm, we only need to calculate 2.6 % of the whole distance matrix. The active clustering algorithm achieved 86.9% of family name accuracy and 96.5% of architecture name accuracy. Furthermore, the active clustering algorithm achieved the same level of accuracy as our former clustering algorithm with only 2.6 % observation, while our former algorithm needs to observe 7.2 % of the distance matrix. The observation reduction rate is 64 %.
使用主动数据选择的物联网恶意软件的可扩展和快速分层聚类
近年来,物联网恶意软件样本数量增长迅速,种类繁多。为了高效地分析大量恶意软件样本,我们采用不完全距离矩阵对样本进行聚类,以降低计算成本。为此,我们采用了主动聚类算法。该算法采用平均场近似法(Mean-Field annealing, MFA)确定最佳聚类和信息准则期望值,主动选择哪对样本观察其距离。我们用3008个样本对主动聚类算法进行了评估。采用主动聚类算法,我们只需要计算整个距离矩阵的2.6%。主动聚类算法的姓氏正确率为86.9%,建筑名称正确率为96.5%。此外,主动聚类算法只需要2.6%的观测量就可以达到与我们之前的聚类算法相同的精度水平,而我们之前的算法需要观察7.2%的距离矩阵。观察减少率为64%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信