Efficient Binary Static Code Data Flow Analysis Using Unsupervised Learning

James Obert, Timothy Loffredo
{"title":"Efficient Binary Static Code Data Flow Analysis Using Unsupervised Learning","authors":"James Obert, Timothy Loffredo","doi":"10.1109/AI4I51902.2021.00030","DOIUrl":null,"url":null,"abstract":"The ever increasing need to ensure that code is reliably, efficiently and safely constructed has fueled the evolution of popular static binary code analysis tools. In identifying potential coding flaws in binaries, tools such as IDA Pro are used to disassemble the binaries into an opcode/assembly language format in support of manual static code analysis. Because of the highly manual and resource intensive nature involved with analyzing large binaries, the probability of overlooking potential coding irregularities and inefficiencies is quite high. In this paper, a light-weight, unsupervised data flow methodology is described which uses highly-correlated data flow graph (CDFGs) to identify coding irregularities such that analysis time and required computing resources are minimized. Such analysis accuracy and efficiency gains are achieved by using a combination of graph analysis and unsupervised machine learning techniques which allows an analyst to focus on the most statistically significant flow patterns while performing binary static code analysis.","PeriodicalId":114373,"journal":{"name":"2021 4th International Conference on Artificial Intelligence for Industries (AI4I)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 4th International Conference on Artificial Intelligence for Industries (AI4I)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AI4I51902.2021.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The ever increasing need to ensure that code is reliably, efficiently and safely constructed has fueled the evolution of popular static binary code analysis tools. In identifying potential coding flaws in binaries, tools such as IDA Pro are used to disassemble the binaries into an opcode/assembly language format in support of manual static code analysis. Because of the highly manual and resource intensive nature involved with analyzing large binaries, the probability of overlooking potential coding irregularities and inefficiencies is quite high. In this paper, a light-weight, unsupervised data flow methodology is described which uses highly-correlated data flow graph (CDFGs) to identify coding irregularities such that analysis time and required computing resources are minimized. Such analysis accuracy and efficiency gains are achieved by using a combination of graph analysis and unsupervised machine learning techniques which allows an analyst to focus on the most statistically significant flow patterns while performing binary static code analysis.
使用无监督学习的高效二进制静态代码数据流分析
不断增长的确保代码可靠、高效和安全构造的需求推动了流行的静态二进制代码分析工具的发展。在识别二进制文件中潜在的编码缺陷时,使用IDA Pro等工具将二进制文件反汇编为操作码/汇编语言格式,以支持手动静态代码分析。由于分析大型二进制文件所涉及的高度手工和资源密集型性质,因此忽略潜在的编码不规则性和低效率的可能性非常高。本文描述了一种轻量级的无监督数据流方法,该方法使用高度相关的数据流图(cdfg)来识别编码不规则性,从而最大限度地减少分析时间和所需的计算资源。这种分析的准确性和效率的提高是通过使用图形分析和无监督机器学习技术的组合来实现的,这种技术允许分析师在执行二进制静态代码分析时专注于最具统计意义的流模式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信