Java字节码反编译器的优点和行为特点

2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM) Pub Date : 2019-08-19 DOI:10.1109/SCAM.2019.00019

Nicolas Harrand, César Soto-Valero, Monperrus Martin, B. Baudry

{"title":"Java字节码反编译器的优点和行为特点","authors":"Nicolas Harrand, César Soto-Valero, Monperrus Martin, B. Baudry","doi":"10.1109/SCAM.2019.00019","DOIUrl":null,"url":null,"abstract":"During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, the decompilation process, which aims at producing source code from bytecode, must establish some strategies to reconstruct the information that has been lost. Modern Java decompilers tend to use distinct strategies to achieve proper decompilation. In this work, we hypothesize that the diverse ways in which bytecode can be decompiled has a direct impact on the quality of the source code produced by decompilers. We study the effectiveness of eight Java decompilers with respect to three quality indicators: syntactic correctness, syntactic distortion and semantic equivalence modulo inputs. This study relies on a benchmark set of 14 real-world open-source software projects to be decompiled (2041 classes in total). Our results show that no single modern decompiler is able to correctly handle the variety of bytecode structures coming from real-world programs. Even the highest ranking decompiler in this study produces syntactically correct output for 84% of classes of our dataset and semantically equivalent code output for 78% of classes.","PeriodicalId":431316,"journal":{"name":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"The Strengths and Behavioral Quirks of Java Bytecode Decompilers\",\"authors\":\"Nicolas Harrand, César Soto-Valero, Monperrus Martin, B. Baudry\",\"doi\":\"10.1109/SCAM.2019.00019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, the decompilation process, which aims at producing source code from bytecode, must establish some strategies to reconstruct the information that has been lost. Modern Java decompilers tend to use distinct strategies to achieve proper decompilation. In this work, we hypothesize that the diverse ways in which bytecode can be decompiled has a direct impact on the quality of the source code produced by decompilers. We study the effectiveness of eight Java decompilers with respect to three quality indicators: syntactic correctness, syntactic distortion and semantic equivalence modulo inputs. This study relies on a benchmark set of 14 real-world open-source software projects to be decompiled (2041 classes in total). Our results show that no single modern decompiler is able to correctly handle the variety of bytecode structures coming from real-world programs. Even the highest ranking decompiler in this study produces syntactically correct output for 84% of classes of our dataset and semantically equivalent code output for 78% of classes.\",\"PeriodicalId\":431316,\"journal\":{\"name\":\"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCAM.2019.00019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM.2019.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

在从Java源代码到字节码的编译过程中，一些信息将不可逆转地丢失。换句话说，Java代码的编译和反编译不是对称的。因此，以字节码生成源代码为目标的反编译过程必须建立一些策略来重建丢失的信息。现代Java反编译器倾向于使用不同的策略来实现正确的反编译。在这项工作中，我们假设字节码反编译的不同方式对反编译器生成的源代码的质量有直接的影响。我们研究了八个Java反编译器在三个质量指标方面的有效性:语法正确性，语法失真和语义等价模输入。这项研究依赖于14个真实开源软件项目的基准集进行反编译(总共2041个类)。我们的结果表明，没有一个单一的现代反编译器能够正确地处理来自现实世界程序的各种字节码结构。即使是本研究中排名最高的反编译器，也会为我们数据集中84%的类产生语法正确的输出，为78%的类产生语义等效的代码输出。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Strengths and Behavioral Quirks of Java Bytecode Decompilers

During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, the decompilation process, which aims at producing source code from bytecode, must establish some strategies to reconstruct the information that has been lost. Modern Java decompilers tend to use distinct strategies to achieve proper decompilation. In this work, we hypothesize that the diverse ways in which bytecode can be decompiled has a direct impact on the quality of the source code produced by decompilers. We study the effectiveness of eight Java decompilers with respect to three quality indicators: syntactic correctness, syntactic distortion and semantic equivalence modulo inputs. This study relies on a benchmark set of 14 real-world open-source software projects to be decompiled (2041 classes in total). Our results show that no single modern decompiler is able to correctly handle the variety of bytecode structures coming from real-world programs. Even the highest ranking decompiler in this study produces syntactically correct output for 84% of classes of our dataset and semantically equivalent code output for 78% of classes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)

自引率

0.00%

发文量