A study of Java virtual machine scalability issues on SMP systems

IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005. Pub Date : 2005-11-07 DOI:10.1109/IISWC.2005.1526008

Zhongbo Cao, Wei Huang, J.M. Chang

{"title":"A study of Java virtual machine scalability issues on SMP systems","authors":"Zhongbo Cao, Wei Huang, J.M. Chang","doi":"10.1109/IISWC.2005.1526008","DOIUrl":null,"url":null,"abstract":"This paper studies the scalability issues of Java virtual machine (JVM) on symmetrical multiprocessing (SMP) systems. Using a cycle-accurate simulator, we evaluate the performance scaling of multithreaded Java benchmarks with the number of processors and application threads. By correlating low-level hardware performance data to two high-level software constructs: thread types and memory regions, we present in detail the performance analysis and study the potential performance impacts of memory system latencies and resource contentions on scalability. Several key findings are revealed through this paper. First, among the memory access latency components, the primary portion of memory stalls are produced by L2 cache misses and cache-to-cache transfers. Second, among the regions of memory, Java heap space produces most memory stalls. Additionally, a large majority of memory stalls occur in application threads, as opposed to other JVM threads. Furthermore, we find that increasing the number of processors or application threads, independently of each other, leads to increases in L2 cache miss ratio and cache-to-cache transfer ratio. This problem can be alleviated by using a thread-local heap or allocation buffer which can improve L2 cache performance. For certain benchmarks such as Raytracer, their cache-to-cache transfers, mainly dominated by false sharing, can be significantly reduced. Our experiments also show that a thread-local allocation buffer with a size between 16KB and 256KB often leads to optimal performance.","PeriodicalId":275514,"journal":{"name":"IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2005.1526008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

This paper studies the scalability issues of Java virtual machine (JVM) on symmetrical multiprocessing (SMP) systems. Using a cycle-accurate simulator, we evaluate the performance scaling of multithreaded Java benchmarks with the number of processors and application threads. By correlating low-level hardware performance data to two high-level software constructs: thread types and memory regions, we present in detail the performance analysis and study the potential performance impacts of memory system latencies and resource contentions on scalability. Several key findings are revealed through this paper. First, among the memory access latency components, the primary portion of memory stalls are produced by L2 cache misses and cache-to-cache transfers. Second, among the regions of memory, Java heap space produces most memory stalls. Additionally, a large majority of memory stalls occur in application threads, as opposed to other JVM threads. Furthermore, we find that increasing the number of processors or application threads, independently of each other, leads to increases in L2 cache miss ratio and cache-to-cache transfer ratio. This problem can be alleviated by using a thread-local heap or allocation buffer which can improve L2 cache performance. For certain benchmarks such as Raytracer, their cache-to-cache transfers, mainly dominated by false sharing, can be significantly reduced. Our experiments also show that a thread-local allocation buffer with a size between 16KB and 256KB often leads to optimal performance.

查看原文本刊更多论文

SMP系统上Java虚拟机可伸缩性问题的研究

本文研究了对称多处理(SMP)系统上Java虚拟机(JVM)的可伸缩性问题。使用周期精确的模拟器，我们用处理器和应用程序线程的数量来评估多线程Java基准的性能扩展。通过将低级硬件性能数据与两个高级软件结构:线程类型和内存区域相关联，我们详细介绍了性能分析，并研究了内存系统延迟和资源争用对可伸缩性的潜在性能影响。通过本文揭示了几个关键的发现。首先，在内存访问延迟组件中，内存延迟的主要部分是由L2缓存丢失和缓存到缓存传输产生的。其次，在内存区域中，Java堆空间产生的内存停顿最多。此外，与其他JVM线程相反，大部分内存停滞发生在应用程序线程中。此外，我们发现增加彼此独立的处理器或应用程序线程的数量会导致L2缓存缺失率和缓存到缓存传输率的增加。这个问题可以通过使用线程本地堆或分配缓冲区来缓解，这可以提高二级缓存的性能。对于某些基准，如Raytracer，它们的缓存到缓存传输(主要由虚假共享主导)可以显著减少。我们的实验还表明，大小在16KB到256KB之间的线程本地分配缓冲区通常会带来最佳性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005.

自引率

0.00%

发文量