Bytecode fetch optimization for a Java interpreter

ASPLOS X Pub Date : 2002-10-01 DOI:10.1145/605397.605404

Kazunori Ogata, H. Komatsu, T. Nakatani

{"title":"Bytecode fetch optimization for a Java interpreter","authors":"Kazunori Ogata, H. Komatsu, T. Nakatani","doi":"10.1145/605397.605404","DOIUrl":null,"url":null,"abstract":"Interpreters play an important role in many languages, and their performance is critical particularly for the popular language Java. The performance of the interpreter is important even for high-performance virtual machines that employ just-in-time compiler technology, because there are advantages in delaying the start of compilation and in reducing the number of the target methods to be compiled. Many techniques have been proposed to improve the performance of various interpreters, but none of them has fully addressed the issues of minimizing redundant memory accesses and the overhead of indirect branches inherent to interpreters running on superscalar processors. These issues are especially serious for Java because each bytecode is typically one or a few bytes long and the execution routine for each bytecode is also short due to the low-level, stack-based semantics of Java bytecode. In this paper, we describe three novel techniques of our Java bytecode interpreter, write-through top-of-stack caching (WT), position-based handler customization (PHC), and position-based speculative decoding (PSD), which ameliorate these problems for the PowerPC processors. We show how each technique contributes to improving the overall performance of the interpreter for major Java benchmark programs on an IBM POWER3 processor. Among three, PHC is the most effective one. We also show that the main source of memory accesses is due to bytecode fetches and that PHC successfully eliminates the majority of them, while it keeps the instruction cache miss ratios small.","PeriodicalId":377379,"journal":{"name":"ASPLOS X","volume":"1991 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ASPLOS X","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/605397.605404","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Interpreters play an important role in many languages, and their performance is critical particularly for the popular language Java. The performance of the interpreter is important even for high-performance virtual machines that employ just-in-time compiler technology, because there are advantages in delaying the start of compilation and in reducing the number of the target methods to be compiled. Many techniques have been proposed to improve the performance of various interpreters, but none of them has fully addressed the issues of minimizing redundant memory accesses and the overhead of indirect branches inherent to interpreters running on superscalar processors. These issues are especially serious for Java because each bytecode is typically one or a few bytes long and the execution routine for each bytecode is also short due to the low-level, stack-based semantics of Java bytecode. In this paper, we describe three novel techniques of our Java bytecode interpreter, write-through top-of-stack caching (WT), position-based handler customization (PHC), and position-based speculative decoding (PSD), which ameliorate these problems for the PowerPC processors. We show how each technique contributes to improving the overall performance of the interpreter for major Java benchmark programs on an IBM POWER3 processor. Among three, PHC is the most effective one. We also show that the main source of memory accesses is due to bytecode fetches and that PHC successfully eliminates the majority of them, while it keeps the instruction cache miss ratios small.

查看原文本刊更多论文

Java解释器的字节码获取优化

解释器在许多语言中都扮演着重要的角色，它们的性能非常关键，尤其是对于流行的语言Java。即使对于使用即时编译器技术的高性能虚拟机，解释器的性能也很重要，因为延迟编译的开始时间和减少要编译的目标方法的数量是有好处的。已经提出了许多技术来提高各种解释器的性能，但是没有一种技术能够完全解决最小化冗余内存访问和运行在超标量处理器上的解释器固有的间接分支开销的问题。这些问题对于Java来说尤其严重，因为每个字节码通常是一个或几个字节长，而且由于Java字节码的低级、基于堆栈的语义，每个字节码的执行例程也很短。在本文中，我们描述了Java字节码解释器的三种新技术，即透写堆栈顶缓存(WT)、基于位置的处理程序定制(PHC)和基于位置的推测解码(PSD)，它们改善了PowerPC处理器的这些问题。我们将展示每种技术如何有助于提高IBM POWER3处理器上主要Java基准程序的解释器的整体性能。其中，PHC是最有效的。我们还表明，内存访问的主要来源是字节码获取，PHC成功地消除了其中的大部分，同时使指令缓存丢失率保持在较小的水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ASPLOS X

自引率

0.00%

发文量