从二进制码中恢复算法的高级中间表示

2019 Ivannikov Memorial Workshop (IVMEM) Pub Date : 2019-09-01 DOI:10.1109/IVMEM.2019.00015

A. Bugerya, I. Kulagin, V. Padaryan, M. A. Solovev, A. Tikhonov

{"title":"从二进制码中恢复算法的高级中间表示","authors":"A. Bugerya, I. Kulagin, V. Padaryan, M. A. Solovev, A. Tikhonov","doi":"10.1109/IVMEM.2019.00015","DOIUrl":null,"url":null,"abstract":"One of the tasks of binary code security analysis is detection of undocumented features in software. This task is hard to automate, and it requires participation of a cybersecurity expert. The way of representation of the algorithm under analysis strongly determines the analysis effort and quality of its results. Existing intermediate representations and languages are intended for use in software that either carries out optimizing transformations or analyzes binary code. Such representations and intermediate languages are unsuitable for manual data flow analysis. This paper proposes a high-level hierarchical flowchart-based representation of a program algorithm as well as an algorithm for its construction. The proposed representation is based on a hypergraph and it allows both automatic and manual data flow analysis on different detail levels. The hypergraph nodes represent functions. Every node contains a set of other nodes which are fragments. The fragment is a linear sequence of instructions that does not contain call and ret instructions. Edges represent data flows between nodes and correspond to memory buffers and registers. In the future this representation can be used to implement automatic analysis algorithms. An approach is proposed to increasing quality of the developed algorithm representation using grouping of single data flows into one flow connecting logical algorithm modules.","PeriodicalId":166102,"journal":{"name":"2019 Ivannikov Memorial Workshop (IVMEM)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Recovery of High-Level Intermediate Representations of Algorithms from Binary Code\",\"authors\":\"A. Bugerya, I. Kulagin, V. Padaryan, M. A. Solovev, A. Tikhonov\",\"doi\":\"10.1109/IVMEM.2019.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the tasks of binary code security analysis is detection of undocumented features in software. This task is hard to automate, and it requires participation of a cybersecurity expert. The way of representation of the algorithm under analysis strongly determines the analysis effort and quality of its results. Existing intermediate representations and languages are intended for use in software that either carries out optimizing transformations or analyzes binary code. Such representations and intermediate languages are unsuitable for manual data flow analysis. This paper proposes a high-level hierarchical flowchart-based representation of a program algorithm as well as an algorithm for its construction. The proposed representation is based on a hypergraph and it allows both automatic and manual data flow analysis on different detail levels. The hypergraph nodes represent functions. Every node contains a set of other nodes which are fragments. The fragment is a linear sequence of instructions that does not contain call and ret instructions. Edges represent data flows between nodes and correspond to memory buffers and registers. In the future this representation can be used to implement automatic analysis algorithms. An approach is proposed to increasing quality of the developed algorithm representation using grouping of single data flows into one flow connecting logical algorithm modules.\",\"PeriodicalId\":166102,\"journal\":{\"name\":\"2019 Ivannikov Memorial Workshop (IVMEM)\",\"volume\":\"114 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Ivannikov Memorial Workshop (IVMEM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IVMEM.2019.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Ivannikov Memorial Workshop (IVMEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IVMEM.2019.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

二进制代码安全分析的任务之一是检测软件中未记录的特性。这项任务很难自动化，需要网络安全专家的参与。被分析算法的表示方式在很大程度上决定了分析的效果和结果的质量。现有的中间表示和语言旨在用于执行优化转换或分析二进制代码的软件。这种表示和中间语言不适合手工数据流分析。本文提出了一种基于高级层次流程图的程序算法表示方法及其构造算法。提出的表示是基于超图的，它允许在不同的细节级别上进行自动和手动数据流分析。超图节点表示函数。每个节点包含一组其他节点，这些节点是片段。片段是一个不包含调用和ret指令的线性指令序列。边表示节点之间的数据流，并对应于内存缓冲区和寄存器。在未来，这种表示可以用于实现自动分析算法。提出了一种通过将单个数据流分组为一个连接逻辑算法模块的流来提高所开发算法表示质量的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Recovery of High-Level Intermediate Representations of Algorithms from Binary Code

One of the tasks of binary code security analysis is detection of undocumented features in software. This task is hard to automate, and it requires participation of a cybersecurity expert. The way of representation of the algorithm under analysis strongly determines the analysis effort and quality of its results. Existing intermediate representations and languages are intended for use in software that either carries out optimizing transformations or analyzes binary code. Such representations and intermediate languages are unsuitable for manual data flow analysis. This paper proposes a high-level hierarchical flowchart-based representation of a program algorithm as well as an algorithm for its construction. The proposed representation is based on a hypergraph and it allows both automatic and manual data flow analysis on different detail levels. The hypergraph nodes represent functions. Every node contains a set of other nodes which are fragments. The fragment is a linear sequence of instructions that does not contain call and ret instructions. Edges represent data flows between nodes and correspond to memory buffers and registers. In the future this representation can be used to implement automatic analysis algorithms. An approach is proposed to increasing quality of the developed algorithm representation using grouping of single data flows into one flow connecting logical algorithm modules.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 Ivannikov Memorial Workshop (IVMEM)

自引率

0.00%

发文量