Generating Rule-Based Signatures for Detecting Polymorphic Variants Using Data Mining and Sequence Alignment Approaches

Vijay Naidu, Jacqueline L. Whalley, A. Narayanan
{"title":"Generating Rule-Based Signatures for Detecting Polymorphic Variants Using Data Mining and Sequence Alignment Approaches","authors":"Vijay Naidu, Jacqueline L. Whalley, A. Narayanan","doi":"10.4236/JIS.2018.94019","DOIUrl":null,"url":null,"abstract":"Antiviral software systems (AVSs) have problems in detecting polymorphic variants of viruses without specific signatures for such variants. Previous alignment-based approaches for automatic signature extraction have shown how signatures can be generated from consensuses found in polymorphic variant code. Such sequence alignment approaches required variable length viral code to be extended through gap insertions into much longer equal length code for signature extraction through data mining of consensuses. Non-nested generalized exemplars (NNge) are used in this paper in an attempt to further improve the automatic detection of polymorphic variants. The important contribution of this paper is to compare a variable length data mining technique using viral source code to the previously used equal length data mining technique obtained through sequence alignment. This comparison was achieved by conducting three different experiments (i.e. Experiments I-III). Although Experiments I and II generated unique and effective syntactic signatures, Experiment III generated the most effective signatures with an average detection rate of over 93%. The implications are that future, syntactic-based smart AVSs may be able to generate effective signatures automatically from malware code by adopting data mining and alignment techniques to cover for both known and unknown polymorphic variants and without the need for semantic (run-time) analysis.","PeriodicalId":57259,"journal":{"name":"信息安全(英文)","volume":"9 1","pages":"265-298"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"信息安全(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.4236/JIS.2018.94019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Antiviral software systems (AVSs) have problems in detecting polymorphic variants of viruses without specific signatures for such variants. Previous alignment-based approaches for automatic signature extraction have shown how signatures can be generated from consensuses found in polymorphic variant code. Such sequence alignment approaches required variable length viral code to be extended through gap insertions into much longer equal length code for signature extraction through data mining of consensuses. Non-nested generalized exemplars (NNge) are used in this paper in an attempt to further improve the automatic detection of polymorphic variants. The important contribution of this paper is to compare a variable length data mining technique using viral source code to the previously used equal length data mining technique obtained through sequence alignment. This comparison was achieved by conducting three different experiments (i.e. Experiments I-III). Although Experiments I and II generated unique and effective syntactic signatures, Experiment III generated the most effective signatures with an average detection rate of over 93%. The implications are that future, syntactic-based smart AVSs may be able to generate effective signatures automatically from malware code by adopting data mining and alignment techniques to cover for both known and unknown polymorphic variants and without the need for semantic (run-time) analysis.
使用数据挖掘和序列比对方法生成用于检测多态变体的基于规则的签名
反病毒软件系统(AVS)在检测病毒的多态变体时存在问题,而没有针对这些变体的特定签名。以前用于自动签名提取的基于比对的方法已经表明了如何从多态变体代码中发现的一致性中生成签名。这种序列比对方法需要通过间隙插入将可变长度的病毒代码扩展到更长的等长代码中,以便通过共识的数据挖掘提取签名。本文使用了非嵌套广义样本(NNge),试图进一步改进多态变体的自动检测。本文的重要贡献是将使用病毒源代码的可变长度数据挖掘技术与以前使用的通过序列比对获得的等长数据挖掘技术进行比较。这种比较是通过进行三个不同的实验(即实验i-III)来实现的。尽管实验一和实验二产生了独特有效的句法签名,但实验三产生了最有效的签名,平均检测率超过93%。其含义是,通过采用数据挖掘和对齐技术来覆盖已知和未知的多态变体,未来基于句法的智能AVS可能能够从恶意软件代码中自动生成有效的签名,而无需进行语义(运行时)分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
211
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信