Accelerating sequential consistency for Java with speculative compilation

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2019-06-08 DOI:10.1145/3314221.3314611

Lun Liu, T. Millstein, M. Musuvathi

{"title":"Accelerating sequential consistency for Java with speculative compilation","authors":"Lun Liu, T. Millstein, M. Musuvathi","doi":"10.1145/3314221.3314611","DOIUrl":null,"url":null,"abstract":"A memory consistency model (or simply a memory model) specifies the granularity and the order in which memory accesses by one thread become visible to other threads in the program. We previously proposed the volatile-by-default (VBD) memory model as a natural form of sequential consistency (SC) for Java. VBD is significantly stronger than the Java memory model (JMM) and incurs relatively modest overheads in a modified HotSpot JVM running on Intel x86 hardware. However, the x86 memory model is already quite close to SC. It is expected that the cost of VBD will be much higher on the other widely used hardware platform today, namely ARM, whose memory model is very weak. In this paper, we quantify this expectation by building and evaluating a baseline volatile-by-default JVM for ARM called VBDA-HotSpot, using the same technique previously used for x86. Through this baseline we report, to the best of our knowledge, the first comprehensive study of the cost of providing language-level SC for a production compiler on ARM. VBDA-HotSpot indeed incurs a considerable performance penalty on ARM, with average overheads on the DaCapo benchmarks on two ARM servers of 57% and 73% respectively. Motivated by these experimental results, we then present a novel speculative technique to optimize language-level SC. While several prior works have shown how to optimize SC in the context of an offline, whole-program compiler, to our knowledge this is the first optimization approach that is compatible with modern implementation technology, including dynamic class loading and just-in-time (JIT) compilation. The basic idea is to modify the JIT compiler to treat each object as thread-local initially, so accesses to its fields can be compiled without fences. If an object is ever accessed by a second thread, any speculatively compiled code for the object is removed, and future JITed code for the object will include the necessary fences in order to ensure SC. We demonstrate that this technique is effective, reducing the overhead of enforcing VBD by one-third on average, and additional experiments validate the thread-locality hypothesis that underlies the approach.","PeriodicalId":441774,"journal":{"name":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3314221.3314611","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

A memory consistency model (or simply a memory model) specifies the granularity and the order in which memory accesses by one thread become visible to other threads in the program. We previously proposed the volatile-by-default (VBD) memory model as a natural form of sequential consistency (SC) for Java. VBD is significantly stronger than the Java memory model (JMM) and incurs relatively modest overheads in a modified HotSpot JVM running on Intel x86 hardware. However, the x86 memory model is already quite close to SC. It is expected that the cost of VBD will be much higher on the other widely used hardware platform today, namely ARM, whose memory model is very weak. In this paper, we quantify this expectation by building and evaluating a baseline volatile-by-default JVM for ARM called VBDA-HotSpot, using the same technique previously used for x86. Through this baseline we report, to the best of our knowledge, the first comprehensive study of the cost of providing language-level SC for a production compiler on ARM. VBDA-HotSpot indeed incurs a considerable performance penalty on ARM, with average overheads on the DaCapo benchmarks on two ARM servers of 57% and 73% respectively. Motivated by these experimental results, we then present a novel speculative technique to optimize language-level SC. While several prior works have shown how to optimize SC in the context of an offline, whole-program compiler, to our knowledge this is the first optimization approach that is compatible with modern implementation technology, including dynamic class loading and just-in-time (JIT) compilation. The basic idea is to modify the JIT compiler to treat each object as thread-local initially, so accesses to its fields can be compiled without fences. If an object is ever accessed by a second thread, any speculatively compiled code for the object is removed, and future JITed code for the object will include the necessary fences in order to ensure SC. We demonstrate that this technique is effective, reducing the overhead of enforcing VBD by one-third on average, and additional experiments validate the thread-locality hypothesis that underlies the approach.

查看原文本刊更多论文

通过推测编译加速Java的顺序一致性

内存一致性模型(或简单地称为内存模型)指定了一个线程对程序中其他线程可见的内存访问的粒度和顺序。我们之前提出了volatile-by-default (VBD)内存模型作为Java的顺序一致性(SC)的自然形式。VBD比Java内存模型(JMM)强大得多，并且在运行在Intel x86硬件上的经过修改的HotSpot JVM中产生相对适度的开销。但是x86的内存模型已经非常接近SC了，预计在目前另一个广泛使用的硬件平台上，即内存模型非常薄弱的ARM上，VBD的成本会高得多。在本文中，我们通过构建和评估一个名为VBDA-HotSpot的基线默认JVM来量化这种期望，该JVM使用了与之前用于x86的相同技术。据我们所知，通过这个基线，我们报告了为ARM上的生产编译器提供语言级SC的成本的第一个全面研究。vda - hotspot确实在ARM上造成了相当大的性能损失，两台ARM服务器上的DaCapo基准测试的平均开销分别为57%和73%。在这些实验结果的激励下，我们提出了一种新的推测技术来优化语言级SC。虽然之前的一些工作已经展示了如何在离线、全程序编译器的背景下优化SC，但据我们所知，这是第一个与现代实现技术兼容的优化方法，包括动态类加载和即时(JIT)编译。基本思想是修改JIT编译器，使其最初将每个对象视为线程本地的，这样对其字段的访问就可以在没有栅栏的情况下进行编译。如果一个对象曾经被第二个线程访问，则该对象的任何推测编译代码都将被删除，并且该对象的未来jit代码将包括必要的栅栏，以确保SC。我们证明了这种技术是有效的，将强制执行VBD的开销平均减少了三分之一，并且额外的实验验证了作为该方法基础的线程局部性假设。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

自引率

0.00%

发文量