Source code obfuscation with genetic algorithms using LLVM code optimizations

Pub Date : 2024-05-14 DOI:10.1093/jigpal/jzae069

J. C. de la Torre, Javier Jareño, J. M. Aragón-Jurado, Sébastien Varrette, B. Dorronsoro

{"title":"Source code obfuscation with genetic algorithms using LLVM code optimizations","authors":"J. C. de la Torre, Javier Jareño, J. M. Aragón-Jurado, Sébastien Varrette, B. Dorronsoro","doi":"10.1093/jigpal/jzae069","DOIUrl":null,"url":null,"abstract":"\n With the advent of the cloud computing model allowing a shared access to massive computing facilities, a surging demand emerges for the protection of the intellectual property tied to the programs executed on these uncontrolled systems. If novel paradigm as confidential computing aims at protecting the data manipulated during the execution, obfuscating techniques (in particular at the source code level) remain a popular solution to conceal the purpose of a program or its logic without altering its functionality, thus preventing reverse-engineering on the program even with the help of computing resources. The many advantages of code obfuscation, together with its low cost, makes it a popular technique. This paper proposes a novel methodology for source code obfuscation that can be used together with other traditional obfuscation techniques, making the code more robust against reverse engineering attacks. Three program complexity metrics are used to define three different single-objective combinatorial optimization versions of the problem, which are solved and analysed. Additionally, three multi-objective problems are defined, those considering each of the selected metrics together with the program execution time, in order to avoid strong obfuscations penalizing the performance. The goal of the defined problems is to find sequences of LLVM optimizations that lead to highly obfuscated versions of the original code. These transformations are applied to the back-end pseudo-assembly code (i.e., LLVM Intermediate Representation), thus avoiding any further optimizations by the compiler. Classical genetic algorithms (GAs) are used to solve the studied problems, namely a basic cellular GA for the single-objective problems and the popular NSGA-II for the multi-objective ones. The promising results show the potential of the proposed technique.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/jigpal/jzae069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With the advent of the cloud computing model allowing a shared access to massive computing facilities, a surging demand emerges for the protection of the intellectual property tied to the programs executed on these uncontrolled systems. If novel paradigm as confidential computing aims at protecting the data manipulated during the execution, obfuscating techniques (in particular at the source code level) remain a popular solution to conceal the purpose of a program or its logic without altering its functionality, thus preventing reverse-engineering on the program even with the help of computing resources. The many advantages of code obfuscation, together with its low cost, makes it a popular technique. This paper proposes a novel methodology for source code obfuscation that can be used together with other traditional obfuscation techniques, making the code more robust against reverse engineering attacks. Three program complexity metrics are used to define three different single-objective combinatorial optimization versions of the problem, which are solved and analysed. Additionally, three multi-objective problems are defined, those considering each of the selected metrics together with the program execution time, in order to avoid strong obfuscations penalizing the performance. The goal of the defined problems is to find sequences of LLVM optimizations that lead to highly obfuscated versions of the original code. These transformations are applied to the back-end pseudo-assembly code (i.e., LLVM Intermediate Representation), thus avoiding any further optimizations by the compiler. Classical genetic algorithms (GAs) are used to solve the studied problems, namely a basic cellular GA for the single-objective problems and the popular NSGA-II for the multi-objective ones. The promising results show the potential of the proposed technique.

查看原文

使用 LLVM 代码优化遗传算法混淆源代码

随着允许共享大规模计算设施的云计算模式的出现，对保护与这些不受控制的系统上执行的程序相关的知识产权的需求激增。如果说保密计算这种新模式旨在保护执行过程中操作的数据，那么混淆技术（尤其是源代码级）则仍然是一种流行的解决方案，它可以在不改变程序功能的情况下隐藏程序的目的或逻辑，从而防止程序被逆向工程破解，即使在计算资源的帮助下也是如此。代码混淆具有许多优点，而且成本低廉，因此成为一种流行的技术。本文提出了一种新的源代码混淆方法，可与其他传统混淆技术一起使用，使代码在反向工程攻击面前更加稳健。本文使用三个程序复杂度指标定义了三个不同的单目标组合优化版本问题，并对其进行了求解和分析。此外，还定义了三个多目标问题，这些问题将每个选定的指标与程序执行时间一起考虑，以避免强混淆对性能的影响。这些问题的目标是找到 LLVM 优化序列，从而实现原始代码的高度混淆版本。这些转换将应用于后端伪汇编代码（即 LLVM 中间表示），从而避免编译器的任何进一步优化。经典遗传算法（GA）被用于解决所研究的问题，即针对单目标问题的基本细胞遗传算法和针对多目标问题的流行的 NSGA-II。良好的结果显示了所提出技术的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文