PIdARCI: Using Assembly Instruction Patterns to Identify, Annotate, and Revert Compiler Idioms

2021 18th International Conference on Privacy, Security and Trust (PST) Pub Date : 2021-12-13 DOI:10.1109/PST52912.2021.9647781

Steffen Enders, M. Rybalka, Elmar Padilla

{"title":"PIdARCI: Using Assembly Instruction Patterns to Identify, Annotate, and Revert Compiler Idioms","authors":"Steffen Enders, M. Rybalka, Elmar Padilla","doi":"10.1109/PST52912.2021.9647781","DOIUrl":null,"url":null,"abstract":"Analysis of binary code is a building block of computer security. Especially in malware or firmware analysis where source code oftentimes is not available, techniques like decompilation are utilized to Figure out the functionality of binaries. During the optimization phase in modern compilers, human-readable expressions are often transformed into instruction sequences (compiler idioms or idioms) that may be more efficient in terms of speed or size than the direct translation. However, these transformations are often considerably worse in terms of readability for the analyst. Such compiler specific sequences are not only significantly longer than the apparent translation of the original high-level language operation but also have no trivial correlation to the original expression’s semantics. Modern decompilers address this issue by reverting idioms using static, manually crafted rules. In this paper, we introduce a novel approach to find and annotate arithmetic idioms with their corresponding high-level language expressions to significantly simplify manual analysis. In contrast to previous approaches, our method does not require manual work to create the patterns for matching idioms and significantly less manual labour to derive the transformation rules to calculate the original constants. In our evaluation, we compared the results of PIdARCI against the current academic and commercial state-of-the-art Ghidra, RetDec, and Hex Rays / IDA Pro. We show that PIdARCI matches more than 99% of all considered idioms, exceeding the matching rate of the other approaches.","PeriodicalId":144610,"journal":{"name":"2021 18th International Conference on Privacy, Security and Trust (PST)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 18th International Conference on Privacy, Security and Trust (PST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PST52912.2021.9647781","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Analysis of binary code is a building block of computer security. Especially in malware or firmware analysis where source code oftentimes is not available, techniques like decompilation are utilized to Figure out the functionality of binaries. During the optimization phase in modern compilers, human-readable expressions are often transformed into instruction sequences (compiler idioms or idioms) that may be more efficient in terms of speed or size than the direct translation. However, these transformations are often considerably worse in terms of readability for the analyst. Such compiler specific sequences are not only significantly longer than the apparent translation of the original high-level language operation but also have no trivial correlation to the original expression’s semantics. Modern decompilers address this issue by reverting idioms using static, manually crafted rules. In this paper, we introduce a novel approach to find and annotate arithmetic idioms with their corresponding high-level language expressions to significantly simplify manual analysis. In contrast to previous approaches, our method does not require manual work to create the patterns for matching idioms and significantly less manual labour to derive the transformation rules to calculate the original constants. In our evaluation, we compared the results of PIdARCI against the current academic and commercial state-of-the-art Ghidra, RetDec, and Hex Rays / IDA Pro. We show that PIdARCI matches more than 99% of all considered idioms, exceeding the matching rate of the other approaches.

查看原文本刊更多论文

使用汇编指令模式来识别、注释和还原编译器习惯用法

二进制代码的分析是计算机安全的一个组成部分。特别是在源代码通常不可用的恶意软件或固件分析中，反编译等技术被用来找出二进制文件的功能。在现代编译器的优化阶段，人类可读的表达式经常被转换成指令序列(编译器习惯用法或习惯用法)，这些指令序列在速度或大小方面可能比直接翻译更有效。然而，就分析人员的可读性而言，这些转换通常相当糟糕。这种编译器特定的序列不仅明显长于原始高级语言操作的翻译，而且与原始表达式的语义没有微不足道的相关性。现代反编译器通过使用静态的、手工制作的规则来还原习惯用法来解决这个问题。在本文中，我们引入了一种新的方法来查找和注释算术习语及其相应的高级语言表达式，从而大大简化了手工分析。与以前的方法相比，我们的方法不需要手工创建匹配习惯用法的模式，也大大减少了推导计算原始常数的转换规则的体力劳动。在我们的评估中，我们将PIdARCI的结果与目前学术界和商业上最先进的Ghidra、RetDec和Hex Rays / IDA Pro进行了比较。我们发现PIdARCI匹配了99%以上的习语，超过了其他方法的匹配率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 18th International Conference on Privacy, Security and Trust (PST)

自引率

0.00%

发文量