Generating Rule-Based Signatures for Detecting Polymorphic Variants Using Data Mining and Sequence Alignment Approaches

信息安全(英文) Pub Date : 2018-10-11 DOI:10.4236/JIS.2018.94019

Vijay Naidu, Jacqueline L. Whalley, A. Narayanan

{"title":"Generating Rule-Based Signatures for Detecting Polymorphic Variants Using Data Mining and Sequence Alignment Approaches","authors":"Vijay Naidu, Jacqueline L. Whalley, A. Narayanan","doi":"10.4236/JIS.2018.94019","DOIUrl":null,"url":null,"abstract":"Antiviral software systems (AVSs) have problems in detecting polymorphic variants of viruses without specific signatures for such variants. Previous alignment-based approaches for automatic signature extraction have shown how signatures can be generated from consensuses found in polymorphic variant code. Such sequence alignment approaches required variable length viral code to be extended through gap insertions into much longer equal length code for signature extraction through data mining of consensuses. Non-nested generalized exemplars (NNge) are used in this paper in an attempt to further improve the automatic detection of polymorphic variants. The important contribution of this paper is to compare a variable length data mining technique using viral source code to the previously used equal length data mining technique obtained through sequence alignment. This comparison was achieved by conducting three different experiments (i.e. Experiments I-III). Although Experiments I and II generated unique and effective syntactic signatures, Experiment III generated the most effective signatures with an average detection rate of over 93%. The implications are that future, syntactic-based smart AVSs may be able to generate effective signatures automatically from malware code by adopting data mining and alignment techniques to cover for both known and unknown polymorphic variants and without the need for semantic (run-time) analysis.","PeriodicalId":57259,"journal":{"name":"信息安全(英文)","volume":"9 1","pages":"265-298"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"信息安全(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.4236/JIS.2018.94019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Antiviral software systems (AVSs) have problems in detecting polymorphic variants of viruses without specific signatures for such variants. Previous alignment-based approaches for automatic signature extraction have shown how signatures can be generated from consensuses found in polymorphic variant code. Such sequence alignment approaches required variable length viral code to be extended through gap insertions into much longer equal length code for signature extraction through data mining of consensuses. Non-nested generalized exemplars (NNge) are used in this paper in an attempt to further improve the automatic detection of polymorphic variants. The important contribution of this paper is to compare a variable length data mining technique using viral source code to the previously used equal length data mining technique obtained through sequence alignment. This comparison was achieved by conducting three different experiments (i.e. Experiments I-III). Although Experiments I and II generated unique and effective syntactic signatures, Experiment III generated the most effective signatures with an average detection rate of over 93%. The implications are that future, syntactic-based smart AVSs may be able to generate effective signatures automatically from malware code by adopting data mining and alignment techniques to cover for both known and unknown polymorphic variants and without the need for semantic (run-time) analysis.

查看原文本刊更多论文

使用数据挖掘和序列比对方法生成用于检测多态变体的基于规则的签名

反病毒软件系统（AVS）在检测病毒的多态变体时存在问题，而没有针对这些变体的特定签名。以前用于自动签名提取的基于比对的方法已经表明了如何从多态变体代码中发现的一致性中生成签名。这种序列比对方法需要通过间隙插入将可变长度的病毒代码扩展到更长的等长代码中，以便通过共识的数据挖掘提取签名。本文使用了非嵌套广义样本（NNge），试图进一步改进多态变体的自动检测。本文的重要贡献是将使用病毒源代码的可变长度数据挖掘技术与以前使用的通过序列比对获得的等长数据挖掘技术进行比较。这种比较是通过进行三个不同的实验（即实验i-III）来实现的。尽管实验一和实验二产生了独特有效的句法签名，但实验三产生了最有效的签名，平均检测率超过93%。其含义是，通过采用数据挖掘和对齐技术来覆盖已知和未知的多态变体，未来基于句法的智能AVS可能能够从恶意软件代码中自动生成有效的签名，而无需进行语义（运行时）分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

信息安全(英文)

自引率

0.00%

发文量

211