J. C. de la Torre, Javier Jareño, J. M. Aragón-Jurado, Sébastien Varrette, B. Dorronsoro
{"title":"Source code obfuscation with genetic algorithms using LLVM code optimizations","authors":"J. C. de la Torre, Javier Jareño, J. M. Aragón-Jurado, Sébastien Varrette, B. Dorronsoro","doi":"10.1093/jigpal/jzae069","DOIUrl":null,"url":null,"abstract":"\n With the advent of the cloud computing model allowing a shared access to massive computing facilities, a surging demand emerges for the protection of the intellectual property tied to the programs executed on these uncontrolled systems. If novel paradigm as confidential computing aims at protecting the data manipulated during the execution, obfuscating techniques (in particular at the source code level) remain a popular solution to conceal the purpose of a program or its logic without altering its functionality, thus preventing reverse-engineering on the program even with the help of computing resources. The many advantages of code obfuscation, together with its low cost, makes it a popular technique. This paper proposes a novel methodology for source code obfuscation that can be used together with other traditional obfuscation techniques, making the code more robust against reverse engineering attacks. Three program complexity metrics are used to define three different single-objective combinatorial optimization versions of the problem, which are solved and analysed. Additionally, three multi-objective problems are defined, those considering each of the selected metrics together with the program execution time, in order to avoid strong obfuscations penalizing the performance. The goal of the defined problems is to find sequences of LLVM optimizations that lead to highly obfuscated versions of the original code. These transformations are applied to the back-end pseudo-assembly code (i.e., LLVM Intermediate Representation), thus avoiding any further optimizations by the compiler. Classical genetic algorithms (GAs) are used to solve the studied problems, namely a basic cellular GA for the single-objective problems and the popular NSGA-II for the multi-objective ones. The promising results show the potential of the proposed technique.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/jigpal/jzae069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the advent of the cloud computing model allowing a shared access to massive computing facilities, a surging demand emerges for the protection of the intellectual property tied to the programs executed on these uncontrolled systems. If novel paradigm as confidential computing aims at protecting the data manipulated during the execution, obfuscating techniques (in particular at the source code level) remain a popular solution to conceal the purpose of a program or its logic without altering its functionality, thus preventing reverse-engineering on the program even with the help of computing resources. The many advantages of code obfuscation, together with its low cost, makes it a popular technique. This paper proposes a novel methodology for source code obfuscation that can be used together with other traditional obfuscation techniques, making the code more robust against reverse engineering attacks. Three program complexity metrics are used to define three different single-objective combinatorial optimization versions of the problem, which are solved and analysed. Additionally, three multi-objective problems are defined, those considering each of the selected metrics together with the program execution time, in order to avoid strong obfuscations penalizing the performance. The goal of the defined problems is to find sequences of LLVM optimizations that lead to highly obfuscated versions of the original code. These transformations are applied to the back-end pseudo-assembly code (i.e., LLVM Intermediate Representation), thus avoiding any further optimizations by the compiler. Classical genetic algorithms (GAs) are used to solve the studied problems, namely a basic cellular GA for the single-objective problems and the popular NSGA-II for the multi-objective ones. The promising results show the potential of the proposed technique.