Characterizing a Complex J2EE Workload: A Comprehensive Analysis and Opportunities for Optimizations

Yefim Shuf, I. Steiner
{"title":"Characterizing a Complex J2EE Workload: A Comprehensive Analysis and Opportunities for Optimizations","authors":"Yefim Shuf, I. Steiner","doi":"10.1109/ISPASS.2007.363735","DOIUrl":null,"url":null,"abstract":"While past studies of relatively simple Java benchmarks like SPECjvm98 and SPECjbb2000 have been integral in advancing the server industry, this paper presents an analysis of a significantly more complex 3-Tier J2EE (Java 2 Enterprise Edition) commercial workload, SPECjAppServer2004. Understanding the nature of such commercial workloads is critical to develop the next generation of servers and identify promising directions for systems and software research. In this study, we validate and disprove several assumptions commonly made about Java workloads. For instance, on a tuned system with an appropriately sized heap, the fraction of CPU time spent on garbage collection for this complex workload is small (<2%) compared to commonly studied client-side Java benchmarks. Unlike small benchmarks, this workload has a rather \"flat\" method profile with no obvious hot spots. Therefore, new performance analysis techniques and tools to identify opportunities for optimizations are needed because the traditional 90/10 rule of thumb does not apply. We evaluate hardware performance monitor data and use insights to motivate future research. We find that this workload has a relatively high CPI and a branch misprediction rate. We observe that almost one half of executed instructions are loads and stores and that the data working set is large. There are very few cache-to-cache \"modified data\" transfers which limits opportunities for intelligent thread co-scheduling. We note that while using large pages for a Java heap is a simple and effective way to reduce TLB misses and improve performance, there is room to reduce translation misses further by placing executable code into large pages. We use statistical correlation to quantify the relationship between various hardware events and an overall system performance. We find that CPI is strongly correlated with branch mispredictions, translation misses, instruction cache misses, and bursty data cache misses that trigger data prefetching. We note that target address mispredictions for indirect branches (corresponding to Java virtual method calls) are strongly correlated with instruction cache misses. Our observations can be used by hardware and runtime architects to estimate potential benefits of performance enhancements being considered","PeriodicalId":439151,"journal":{"name":"2007 IEEE International Symposium on Performance Analysis of Systems & Software","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE International Symposium on Performance Analysis of Systems & Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2007.363735","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

While past studies of relatively simple Java benchmarks like SPECjvm98 and SPECjbb2000 have been integral in advancing the server industry, this paper presents an analysis of a significantly more complex 3-Tier J2EE (Java 2 Enterprise Edition) commercial workload, SPECjAppServer2004. Understanding the nature of such commercial workloads is critical to develop the next generation of servers and identify promising directions for systems and software research. In this study, we validate and disprove several assumptions commonly made about Java workloads. For instance, on a tuned system with an appropriately sized heap, the fraction of CPU time spent on garbage collection for this complex workload is small (<2%) compared to commonly studied client-side Java benchmarks. Unlike small benchmarks, this workload has a rather "flat" method profile with no obvious hot spots. Therefore, new performance analysis techniques and tools to identify opportunities for optimizations are needed because the traditional 90/10 rule of thumb does not apply. We evaluate hardware performance monitor data and use insights to motivate future research. We find that this workload has a relatively high CPI and a branch misprediction rate. We observe that almost one half of executed instructions are loads and stores and that the data working set is large. There are very few cache-to-cache "modified data" transfers which limits opportunities for intelligent thread co-scheduling. We note that while using large pages for a Java heap is a simple and effective way to reduce TLB misses and improve performance, there is room to reduce translation misses further by placing executable code into large pages. We use statistical correlation to quantify the relationship between various hardware events and an overall system performance. We find that CPI is strongly correlated with branch mispredictions, translation misses, instruction cache misses, and bursty data cache misses that trigger data prefetching. We note that target address mispredictions for indirect branches (corresponding to Java virtual method calls) are strongly correlated with instruction cache misses. Our observations can be used by hardware and runtime architects to estimate potential benefits of performance enhancements being considered
描述复杂的J2EE工作负载:全面的分析和优化的机会
虽然过去对相对简单的Java基准(如SPECjvm98和SPECjbb2000)的研究在推动服务器行业发展方面发挥了不可或缺的作用,但本文介绍了对更复杂的3层J2EE (Java 2企业版)商业工作负载SPECjAppServer2004的分析。了解此类商业工作负载的性质对于开发下一代服务器和确定系统和软件研究的有前途的方向至关重要。在本研究中,我们验证和反驳了关于Java工作负载的几个常见假设。例如,在具有适当大小堆的调优系统上,与通常研究的客户端Java基准测试相比,用于此复杂工作负载的垃圾收集的CPU时间比例很小(<2%)。与小型基准测试不同,此工作负载具有相当“平坦”的方法配置文件,没有明显的热点。因此,需要新的性能分析技术和工具来识别优化机会,因为传统的90/10法则不适用。我们评估硬件性能监控数据,并使用见解来激励未来的研究。我们发现此工作负载具有相对较高的CPI和分支错误预测率。我们观察到,几乎一半的执行指令是加载和存储指令,而且数据工作集很大。很少有缓存到缓存的“修改数据”传输,这限制了智能线程协同调度的机会。我们注意到,虽然为Java堆使用大页面是减少TLB错误和提高性能的一种简单而有效的方法,但是通过将可执行代码放入大页面中,还有进一步减少翻译错误的空间。我们使用统计相关性来量化各种硬件事件与整体系统性能之间的关系。我们发现CPI与分支错误预测、转换错误、指令缓存错误和触发数据预取的突发数据缓存错误密切相关。我们注意到,间接分支(对应于Java虚拟方法调用)的目标地址错误预测与指令缓存丢失密切相关。硬件和运行时架构师可以使用我们的观察结果来评估正在考虑的性能增强的潜在好处
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信