A. Bugerya, I. Kulagin, V. Padaryan, M. A. Solovev, A. Tikhonov
{"title":"Recovery of High-Level Intermediate Representations of Algorithms from Binary Code","authors":"A. Bugerya, I. Kulagin, V. Padaryan, M. A. Solovev, A. Tikhonov","doi":"10.1109/IVMEM.2019.00015","DOIUrl":null,"url":null,"abstract":"One of the tasks of binary code security analysis is detection of undocumented features in software. This task is hard to automate, and it requires participation of a cybersecurity expert. The way of representation of the algorithm under analysis strongly determines the analysis effort and quality of its results. Existing intermediate representations and languages are intended for use in software that either carries out optimizing transformations or analyzes binary code. Such representations and intermediate languages are unsuitable for manual data flow analysis. This paper proposes a high-level hierarchical flowchart-based representation of a program algorithm as well as an algorithm for its construction. The proposed representation is based on a hypergraph and it allows both automatic and manual data flow analysis on different detail levels. The hypergraph nodes represent functions. Every node contains a set of other nodes which are fragments. The fragment is a linear sequence of instructions that does not contain call and ret instructions. Edges represent data flows between nodes and correspond to memory buffers and registers. In the future this representation can be used to implement automatic analysis algorithms. An approach is proposed to increasing quality of the developed algorithm representation using grouping of single data flows into one flow connecting logical algorithm modules.","PeriodicalId":166102,"journal":{"name":"2019 Ivannikov Memorial Workshop (IVMEM)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Ivannikov Memorial Workshop (IVMEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IVMEM.2019.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
One of the tasks of binary code security analysis is detection of undocumented features in software. This task is hard to automate, and it requires participation of a cybersecurity expert. The way of representation of the algorithm under analysis strongly determines the analysis effort and quality of its results. Existing intermediate representations and languages are intended for use in software that either carries out optimizing transformations or analyzes binary code. Such representations and intermediate languages are unsuitable for manual data flow analysis. This paper proposes a high-level hierarchical flowchart-based representation of a program algorithm as well as an algorithm for its construction. The proposed representation is based on a hypergraph and it allows both automatic and manual data flow analysis on different detail levels. The hypergraph nodes represent functions. Every node contains a set of other nodes which are fragments. The fragment is a linear sequence of instructions that does not contain call and ret instructions. Edges represent data flows between nodes and correspond to memory buffers and registers. In the future this representation can be used to implement automatic analysis algorithms. An approach is proposed to increasing quality of the developed algorithm representation using grouping of single data flows into one flow connecting logical algorithm modules.