{"title":"A study of Java virtual machine scalability issues on SMP systems","authors":"Zhongbo Cao, Wei Huang, J.M. Chang","doi":"10.1109/IISWC.2005.1526008","DOIUrl":null,"url":null,"abstract":"This paper studies the scalability issues of Java virtual machine (JVM) on symmetrical multiprocessing (SMP) systems. Using a cycle-accurate simulator, we evaluate the performance scaling of multithreaded Java benchmarks with the number of processors and application threads. By correlating low-level hardware performance data to two high-level software constructs: thread types and memory regions, we present in detail the performance analysis and study the potential performance impacts of memory system latencies and resource contentions on scalability. Several key findings are revealed through this paper. First, among the memory access latency components, the primary portion of memory stalls are produced by L2 cache misses and cache-to-cache transfers. Second, among the regions of memory, Java heap space produces most memory stalls. Additionally, a large majority of memory stalls occur in application threads, as opposed to other JVM threads. Furthermore, we find that increasing the number of processors or application threads, independently of each other, leads to increases in L2 cache miss ratio and cache-to-cache transfer ratio. This problem can be alleviated by using a thread-local heap or allocation buffer which can improve L2 cache performance. For certain benchmarks such as Raytracer, their cache-to-cache transfers, mainly dominated by false sharing, can be significantly reduced. Our experiments also show that a thread-local allocation buffer with a size between 16KB and 256KB often leads to optimal performance.","PeriodicalId":275514,"journal":{"name":"IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2005.1526008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
This paper studies the scalability issues of Java virtual machine (JVM) on symmetrical multiprocessing (SMP) systems. Using a cycle-accurate simulator, we evaluate the performance scaling of multithreaded Java benchmarks with the number of processors and application threads. By correlating low-level hardware performance data to two high-level software constructs: thread types and memory regions, we present in detail the performance analysis and study the potential performance impacts of memory system latencies and resource contentions on scalability. Several key findings are revealed through this paper. First, among the memory access latency components, the primary portion of memory stalls are produced by L2 cache misses and cache-to-cache transfers. Second, among the regions of memory, Java heap space produces most memory stalls. Additionally, a large majority of memory stalls occur in application threads, as opposed to other JVM threads. Furthermore, we find that increasing the number of processors or application threads, independently of each other, leads to increases in L2 cache miss ratio and cache-to-cache transfer ratio. This problem can be alleviated by using a thread-local heap or allocation buffer which can improve L2 cache performance. For certain benchmarks such as Raytracer, their cache-to-cache transfers, mainly dominated by false sharing, can be significantly reduced. Our experiments also show that a thread-local allocation buffer with a size between 16KB and 256KB often leads to optimal performance.