HyungSeok Han, JeongOh Kyea, Yonghwi Jin, Jinoh Kang, Brian Pak, Insu Yun
{"title":"用于查找COTS二进制文件中错误的反编译代码的符号查询","authors":"HyungSeok Han, JeongOh Kyea, Yonghwi Jin, Jinoh Kang, Brian Pak, Insu Yun","doi":"10.1109/SP46215.2023.10179314","DOIUrl":null,"url":null,"abstract":"Extensible static checking tools, such as Sys and CodeQL, have successfully discovered bugs in source code. These tools allow analysts to write application-specific rules, referred to as queries. These queries can leverage the domain knowledge of analysts, thereby making the analysis more accurate and scalable. However, the majority of these tools are inapplicable to binary-only analysis. One exception, joern, translates a binary code into decompiled code and feeds the decompiled code into an ordinary C code analyzer. However, this approach is not sufficiently precise for symbolic analysis, as it overlooks the unique characteristics of decompiled code. While binary analysis platforms, such as angr, support symbolic analysis, analysts must understand their intermediate representations (IRs) although they are mostly working with decompiled code.In this paper, we propose a precise and scalable symbolic analysis called fearless symbolic analysis that uses intuitive queries for binary code and implement this in QueryX. To make the query intuitive, QueryX enables analysts to write queries on top of decompiled code instead of IRs. In particular, QueryX supports callbacks on decompiled code, using which analysts can control symbolic analysis to discover bugs in the code. For precise analysis, we lift decompiled code into our IR named DNR and perform symbolic analysis on DNR while considering the characteristics of the decompiled code. Notably, DNR is only used internally such that it allows analysts to write queries regardless of using DNR. For scalability, QueryX automatically reduces control-flow graphs using callbacks and ordering dependencies between callbacks that are specified in the queries. We applied QueryX to the Windows kernel, the Windows system service, and an automotive binary. As a result, we found 15 unique bugs including 10 CVEs and earned $180,000 from the Microsoft bug bounty program.","PeriodicalId":439989,"journal":{"name":"2023 IEEE Symposium on Security and Privacy (SP)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"QueryX: Symbolic Query on Decompiled Code for Finding Bugs in COTS Binaries\",\"authors\":\"HyungSeok Han, JeongOh Kyea, Yonghwi Jin, Jinoh Kang, Brian Pak, Insu Yun\",\"doi\":\"10.1109/SP46215.2023.10179314\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Extensible static checking tools, such as Sys and CodeQL, have successfully discovered bugs in source code. These tools allow analysts to write application-specific rules, referred to as queries. These queries can leverage the domain knowledge of analysts, thereby making the analysis more accurate and scalable. However, the majority of these tools are inapplicable to binary-only analysis. One exception, joern, translates a binary code into decompiled code and feeds the decompiled code into an ordinary C code analyzer. However, this approach is not sufficiently precise for symbolic analysis, as it overlooks the unique characteristics of decompiled code. While binary analysis platforms, such as angr, support symbolic analysis, analysts must understand their intermediate representations (IRs) although they are mostly working with decompiled code.In this paper, we propose a precise and scalable symbolic analysis called fearless symbolic analysis that uses intuitive queries for binary code and implement this in QueryX. To make the query intuitive, QueryX enables analysts to write queries on top of decompiled code instead of IRs. In particular, QueryX supports callbacks on decompiled code, using which analysts can control symbolic analysis to discover bugs in the code. For precise analysis, we lift decompiled code into our IR named DNR and perform symbolic analysis on DNR while considering the characteristics of the decompiled code. Notably, DNR is only used internally such that it allows analysts to write queries regardless of using DNR. For scalability, QueryX automatically reduces control-flow graphs using callbacks and ordering dependencies between callbacks that are specified in the queries. We applied QueryX to the Windows kernel, the Windows system service, and an automotive binary. As a result, we found 15 unique bugs including 10 CVEs and earned $180,000 from the Microsoft bug bounty program.\",\"PeriodicalId\":439989,\"journal\":{\"name\":\"2023 IEEE Symposium on Security and Privacy (SP)\",\"volume\":\"102 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Symposium on Security and Privacy (SP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SP46215.2023.10179314\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Symposium on Security and Privacy (SP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SP46215.2023.10179314","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
QueryX: Symbolic Query on Decompiled Code for Finding Bugs in COTS Binaries
Extensible static checking tools, such as Sys and CodeQL, have successfully discovered bugs in source code. These tools allow analysts to write application-specific rules, referred to as queries. These queries can leverage the domain knowledge of analysts, thereby making the analysis more accurate and scalable. However, the majority of these tools are inapplicable to binary-only analysis. One exception, joern, translates a binary code into decompiled code and feeds the decompiled code into an ordinary C code analyzer. However, this approach is not sufficiently precise for symbolic analysis, as it overlooks the unique characteristics of decompiled code. While binary analysis platforms, such as angr, support symbolic analysis, analysts must understand their intermediate representations (IRs) although they are mostly working with decompiled code.In this paper, we propose a precise and scalable symbolic analysis called fearless symbolic analysis that uses intuitive queries for binary code and implement this in QueryX. To make the query intuitive, QueryX enables analysts to write queries on top of decompiled code instead of IRs. In particular, QueryX supports callbacks on decompiled code, using which analysts can control symbolic analysis to discover bugs in the code. For precise analysis, we lift decompiled code into our IR named DNR and perform symbolic analysis on DNR while considering the characteristics of the decompiled code. Notably, DNR is only used internally such that it allows analysts to write queries regardless of using DNR. For scalability, QueryX automatically reduces control-flow graphs using callbacks and ordering dependencies between callbacks that are specified in the queries. We applied QueryX to the Windows kernel, the Windows system service, and an automotive binary. As a result, we found 15 unique bugs including 10 CVEs and earned $180,000 from the Microsoft bug bounty program.