Deep Differential Testing of JVM Implementations

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) Pub Date : 2019-05-01 DOI:10.1109/ICSE.2019.00127

Yuting Chen, Ting Su, Z. Su

{"title":"Deep Differential Testing of JVM Implementations","authors":"Yuting Chen, Ting Su, Z. Su","doi":"10.1109/ICSE.2019.00127","DOIUrl":null,"url":null,"abstract":"The Java Virtual Machine (JVM) is the cornerstone of the widely-used Java platform. Thus, it is critical to ensure the reliability and robustness of popular JVM implementations. However, little research exists on validating production JVMs. One notable effort is classfuzz, which mutates Java bytecode syntactically to stress-test different JVMs. It is shown that classfuzz mainly produces illegal bytecode files and uncovers defects in JVMs' startup processes. It remains a challenge to effectively test JVMs' bytecode verifiers and execution engines to expose deeper bugs. This paper tackles this challenge by introducing classming, a novel, effective approach to performing deep, differential JVM testing. The key of classming is a technique, live bytecode mutation, to generate, from a seed bytecode file f, likely valid, executable (live) bytecode files: (1) capture the seed f 's live bytecode, the sequence of its executed bytecode instructions; (2) repeatedly manipulate the control- and data-flow in f 's live bytecode to generate semantically different mutants; and (3) selectively accept the generated mutants to steer the mutation process toward live, diverse mutants. The generated mutants are then employed to differentially test JVMs. We have evaluated classming on mainstream JVM implementations, including OpenJDK's HotSpot and IBM's J9, by mutating the DaCapo benchmarks. Our results show that classming is very effective in uncovering deep JVM differences. More than 1,800 of the generated classes exposed JVM differences, and more than 30 triggered JVM crashes. We analyzed and reported the JVM runtime differences and crashes, of which 14 have already been confirmed/fixed, including a highly critical security vulnerability in J9 that allowed untrusted code to disable the security manager and elevate its privileges (CVE-2017-1376).","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"115 1","pages":"1257-1268"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2019.00127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 58

Abstract

The Java Virtual Machine (JVM) is the cornerstone of the widely-used Java platform. Thus, it is critical to ensure the reliability and robustness of popular JVM implementations. However, little research exists on validating production JVMs. One notable effort is classfuzz, which mutates Java bytecode syntactically to stress-test different JVMs. It is shown that classfuzz mainly produces illegal bytecode files and uncovers defects in JVMs' startup processes. It remains a challenge to effectively test JVMs' bytecode verifiers and execution engines to expose deeper bugs. This paper tackles this challenge by introducing classming, a novel, effective approach to performing deep, differential JVM testing. The key of classming is a technique, live bytecode mutation, to generate, from a seed bytecode file f, likely valid, executable (live) bytecode files: (1) capture the seed f 's live bytecode, the sequence of its executed bytecode instructions; (2) repeatedly manipulate the control- and data-flow in f 's live bytecode to generate semantically different mutants; and (3) selectively accept the generated mutants to steer the mutation process toward live, diverse mutants. The generated mutants are then employed to differentially test JVMs. We have evaluated classming on mainstream JVM implementations, including OpenJDK's HotSpot and IBM's J9, by mutating the DaCapo benchmarks. Our results show that classming is very effective in uncovering deep JVM differences. More than 1,800 of the generated classes exposed JVM differences, and more than 30 triggered JVM crashes. We analyzed and reported the JVM runtime differences and crashes, of which 14 have already been confirmed/fixed, including a highly critical security vulnerability in J9 that allowed untrusted code to disable the security manager and elevate its privileges (CVE-2017-1376).

查看原文本刊更多论文

JVM实现的深度差异测试

Java虚拟机(JVM)是广泛使用的Java平台的基石。因此，确保流行的JVM实现的可靠性和健壮性至关重要。但是，关于验证生产jvm的研究很少。一个值得注意的工作是classfuzz，它在语法上改变Java字节码以对不同的jvm进行压力测试。结果表明，classfuzz主要产生非法字节码文件，并揭示jvm启动过程中的缺陷。有效地测试jvm的字节码验证器和执行引擎以暴露更深层次的bug仍然是一个挑战。本文通过引入分类来解决这个问题，分类是一种执行深度、差异JVM测试的新颖、有效的方法。分类的关键是一种技术，活字节码突变，从种子字节码文件f中生成可能有效的，可执行的(活)字节码文件:(1)捕获种子的活字节码，其执行的字节码指令的序列;(2)反复操纵f的活字节码中的控制流和数据流，生成语义上不同的突变体;(3)选择性地接受产生的突变体，以引导突变过程向活的、多样化的突变体转变。然后使用生成的突变来对jvm进行差异测试。我们通过改变DaCapo基准测试，在主流JVM实现(包括OpenJDK的HotSpot和IBM的J9)上评估了分类。我们的结果表明，分类在揭示JVM的深层差异方面非常有效。超过1800个生成的类暴露了JVM差异，超过30个类触发了JVM崩溃。我们分析并报告了JVM运行时差异和崩溃，其中14个已经被确认/修复，包括J9中一个非常关键的安全漏洞，该漏洞允许不受信任的代码禁用安全管理器并提升其权限(CVE-2017-1376)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量