上下文线程:一种灵活高效的虚拟机解释器调度技术

Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization Pub Date : 2005-03-20 DOI:10.1109/CGO.2005.14

Marc Berndl, B. Vitale, M. Zaleski, Angela Demke Brown

{"title":"上下文线程:一种灵活高效的虚拟机解释器调度技术","authors":"Marc Berndl, B. Vitale, M. Zaleski, Angela Demke Brown","doi":"10.1109/CGO.2005.14","DOIUrl":null,"url":null,"abstract":"Direct-threaded interpreters use indirect branches to dispatch bytecodes, but deeply-pipelined architectures rely on branch prediction for performance. Due to the poor correlation between the virtual program's control flow and the hardware program counter, which we call the context problem, direct threading's indirect branches are poorly predicted by the hardware, limiting performance. Our dispatch technique, context threading, improves branch prediction and performance by aligning hardware and virtual machine state. Linear virtual instructions are dispatched with native calls and returns, aligning the hardware and virtual PC. Thus, sequential control flow is predicted by the hardware return stack. We convert virtual branching instructions to native branches, mobilizing the hardware's branch prediction resources. We evaluate the impact of context threading on both branch prediction and performance using interpreters for Java and OCaml on the Pentium and PowerPC architectures. On the Pentium IV our technique reduces mean mispredicted branches by 95%. On the PowerPC, it reduces mean branch stall cycles by 75% for OCaml and 82% for Java. Due to reduced branch hazards, context threading reduces mean execution time by 25% for Java and by 19% and 37% for OCaml on the P4 and PPC970, respectively. We also combine context threading with a conservative inlining technique and find its performance comparable to that of selective inlining.","PeriodicalId":92120,"journal":{"name":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","volume":"72 1","pages":"15-26"},"PeriodicalIF":0.0000,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":"{\"title\":\"Context threading: a flexible and efficient dispatch technique for virtual machine interpreters\",\"authors\":\"Marc Berndl, B. Vitale, M. Zaleski, Angela Demke Brown\",\"doi\":\"10.1109/CGO.2005.14\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Direct-threaded interpreters use indirect branches to dispatch bytecodes, but deeply-pipelined architectures rely on branch prediction for performance. Due to the poor correlation between the virtual program's control flow and the hardware program counter, which we call the context problem, direct threading's indirect branches are poorly predicted by the hardware, limiting performance. Our dispatch technique, context threading, improves branch prediction and performance by aligning hardware and virtual machine state. Linear virtual instructions are dispatched with native calls and returns, aligning the hardware and virtual PC. Thus, sequential control flow is predicted by the hardware return stack. We convert virtual branching instructions to native branches, mobilizing the hardware's branch prediction resources. We evaluate the impact of context threading on both branch prediction and performance using interpreters for Java and OCaml on the Pentium and PowerPC architectures. On the Pentium IV our technique reduces mean mispredicted branches by 95%. On the PowerPC, it reduces mean branch stall cycles by 75% for OCaml and 82% for Java. Due to reduced branch hazards, context threading reduces mean execution time by 25% for Java and by 19% and 37% for OCaml on the P4 and PPC970, respectively. We also combine context threading with a conservative inlining technique and find its performance comparable to that of selective inlining.\",\"PeriodicalId\":92120,\"journal\":{\"name\":\"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization\",\"volume\":\"72 1\",\"pages\":\"15-26\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"50\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CGO.2005.14\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CGO.2005.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 50

摘要

直接线程解释器使用间接分支来分派字节码，但深度流水线架构依赖于分支预测来提高性能。由于虚拟程序的控制流与硬件程序计数器之间的相关性较差(我们称之为上下文问题)，直接线程的间接分支无法被硬件预测，从而限制了性能。我们的调度技术，上下文线程，通过对齐硬件和虚拟机状态来提高分支预测和性能。线性虚拟指令与本地调用和返回一起调度，对齐硬件和虚拟PC。因此，顺序控制流是由硬件返回堆栈预测的。我们将虚拟分支指令转换为本地分支，调动硬件的分支预测资源。我们在Pentium和PowerPC架构上使用Java和OCaml解释器来评估上下文线程对分支预测和性能的影响。在奔腾IV上，我们的技术减少了95%的平均错误预测分支。在PowerPC上，OCaml和Java的平均分支停滞周期分别减少了75%和82%。由于减少了分支危险，上下文线程在P4和PPC970上分别将Java的平均执行时间减少了25%，OCaml的平均执行时间减少了19%和37%。我们还将上下文线程与保守内联技术相结合，发现其性能可与选择性内联相媲美。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Context threading: a flexible and efficient dispatch technique for virtual machine interpreters

Direct-threaded interpreters use indirect branches to dispatch bytecodes, but deeply-pipelined architectures rely on branch prediction for performance. Due to the poor correlation between the virtual program's control flow and the hardware program counter, which we call the context problem, direct threading's indirect branches are poorly predicted by the hardware, limiting performance. Our dispatch technique, context threading, improves branch prediction and performance by aligning hardware and virtual machine state. Linear virtual instructions are dispatched with native calls and returns, aligning the hardware and virtual PC. Thus, sequential control flow is predicted by the hardware return stack. We convert virtual branching instructions to native branches, mobilizing the hardware's branch prediction resources. We evaluate the impact of context threading on both branch prediction and performance using interpreters for Java and OCaml on the Pentium and PowerPC architectures. On the Pentium IV our technique reduces mean mispredicted branches by 95%. On the PowerPC, it reduces mean branch stall cycles by 75% for OCaml and 82% for Java. Due to reduced branch hazards, context threading reduces mean execution time by 25% for Java and by 19% and 37% for OCaml on the P4 and PPC970, respectively. We also combine context threading with a conservative inlining technique and find its performance comparable to that of selective inlining.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ... CGO : International Symposium on Code Generation and Optimization. International Symposium on Code Generation and Optimization

自引率

0.00%

发文量