2008 IEEE International Symposium on Workload Characterization最新文献

筛选
英文 中文
Implications of cache asymmetry on server consolidation performance 缓存不对称对服务器整合性能的影响
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636088
P. Apparao, R. Iyer, D. Newell
{"title":"Implications of cache asymmetry on server consolidation performance","authors":"P. Apparao, R. Iyer, D. Newell","doi":"10.1109/IISWC.2008.4636088","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636088","url":null,"abstract":"Todaypsilas CMP platforms are designed to be symmetric in terms of platform resources such as shared caches. However, it is becoming increasingly important to understand the performance implications of asymmetric caches for two key reasons: (a) multi-workload scenarios such as server consolidation are a growing trend and contention for shared cache resources between workloads causes logical cache asymmetry, (b) future CMP platforms may be designed to be physically asymmetric in hardware due to die area pressure, process variability or power/performance efficiency. Our focus in this paper is to understand the performance implications of both logical as well as physical asymmetric caches on server consolidation. Based on real measurements of a state-of-the-art CMP processor running a server consolidation benchmark (vConsolidate) we compare the performance implications as a function of (a) symmetric caches, (b) virtually asymmetric caches, (c) physically asymmetric caches and (d) a combination of logically and physically asymmetric caches. We analyze the performance behavior in terms of (i) performance of each of the individual workloads being consolidated and (ii) architectural components such as CPI and MPI. We believe that this asymmetric cache study is the first of its kind and provides useful data/insights on cache characteristics of server consolidation. We also present inferences on future optimizations in the VMM scheduler as well as potential hardware techniques for future CMPs with cache asymmetry.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121859572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Parallelization and characterization of SIFT on multi-core systems SIFT在多核系统上的并行化与表征
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636087
Hao Feng, E. Li, Yurong Chen, Yimin Zhang
{"title":"Parallelization and characterization of SIFT on multi-core systems","authors":"Hao Feng, E. Li, Yurong Chen, Yimin Zhang","doi":"10.1109/IISWC.2008.4636087","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636087","url":null,"abstract":"This paper parallelizes and characterizes an important computer vision application -Scale Invariant Feature Transform (SIFT) both on a Symmetric Multiprocessor (SMP) platform and a large scale Chip Multiprocessor (CMP) simulator. SIFT is an approach for extracting distinctive invariant features from images and has been widely applied. In many computer vision problems, a real-time or even super-real-time processing capability of SIFT is required. To meet the computation demand, we optimize and parallelize SIFT to accelerate its execution on multi-core systems. Our study shows that SIFT can achieve a 9.7x ~ llx speedup on a 16 -core SMP system. Furthermore, Single Instruction Multiple Data (SIMD) and cache-conscious optimization bring another 85% performance gain at most. But it is still three times slower than the real-time requirement for High-Definition Television (HDTV) image. Then we study the performance of SIFT on a 64 -core CMP simulator. The results show that for HDTV image, SIFT can achieve an excellent speedup of 52 x and run in real-time finally. Besides the parallelization and optimization work, we also conduct a detailed performance analysis for SIFT on those two platforms. We find that load imbalance significantly limits the scalability and SIFT suffers from intensive burst memory bandwidth requirement on the 16 -core SMP system. However, on the 64 -core CMP simulator the memory pressure is not high due to the shared last-level cache (LLC) which accommodates tremendous read-write sharing in SIFT. Thus it does not affect the scaling performance. In short, understanding the characterization of SIFT can help identify the program bottlenecks and give us further insights into designing better systems.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125314922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 51
Can hardware performance counters be trusted? 硬件性能计数器可信吗?
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636099
Vincent M. Weaver, S. Mckee
{"title":"Can hardware performance counters be trusted?","authors":"Vincent M. Weaver, S. Mckee","doi":"10.1109/IISWC.2008.4636099","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636099","url":null,"abstract":"When creating architectural tools, it is essential to know whether the generated results make sense. Comparing a toolpsilas outputs against hardware performance counters on an actual machine is a common means of executing a quick sanity check. If the results do not match, this can indicate problems with the tool, unknown interactions with the benchmarks being investigated, or even unexpected behavior of the real hardware. To make future analyses of this type easier, we explore the behavior of the SPEC benchmarks with both dynamic binary instrumentation (DBI) tools and hardware counters. We collect retired instruction performance counter data from the full SPEC CPU 2000 and 2006 benchmark suites on nine different implementations of the times86 architecture. When run with no special preparation, hardware counters have a coefficient of variation of up to 1.07%. After analyzing results in depth, we find that minor changes to the experimental setup reduce observed errors to less than 0.002% for all benchmarks. The fact that subtle changes in how experiments are conducted can largely impact observed results is unexpected, and it is important that researchers using these counters be aware of the issues involved.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124368377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 105
Accelerating multi-core processor design space evaluation using automatic multi-threaded workload synthesis 使用自动多线程工作负载合成加速多核处理器设计空间评估
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636101
C. Hughes, Tao Li
{"title":"Accelerating multi-core processor design space evaluation using automatic multi-threaded workload synthesis","authors":"C. Hughes, Tao Li","doi":"10.1109/IISWC.2008.4636101","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636101","url":null,"abstract":"The design and evaluation of microprocessor architectures is a difficult and time-consuming task. Although small, hand-coded microbenchmarks can be used to accelerate performance evaluation, these programs lack the complexity to stress increasingly complex architecture designs. Larger and more complex real-world workloads should be employed to measure the performance of a given design or to evaluate the efficiency of various design alternatives. These applications can take days or weeks if run to completion on a detailed architecture simulator. In the past, researchers have applied machine learning and statistical sampling methods to reduce the average number of instructions required for detailed simulation. Others have proposed statistical simulation and workload synthesis techniques, which can produce programs that emulate the execution characteristics of the application from which they are derived but have a much shorter execution period than the original. However, these existing methods are difficult to apply to multi-threaded programs and can result in simplifications that miss the complex interactions between multiple, concurrently running threads. This study focuses on developing new techniques for accurate and effective multi-threaded workload synthesis, which can significantly accelerate architecture design evaluation of multi-core processors. We propose to construct synchronized statistical flow graphs that incorporate inter-thread synchronization and sharing behavior to capture the complex characteristics and interactions of multiple threads. Moreover, we develop thread-aware data reference models and wavelet-based branching models to generate accurate memory access and dynamic branch statistics. Experimental results show that a framework integrated with the aforementioned models can automatically generate synthetic programs that maintain characteristics of original workloads but have significantly reduced runtime.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115371639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Energy-aware application scheduling on a heterogeneous multi-core system 异构多核系统上的能量感知应用程序调度
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636086
Jing Chen, L. John
{"title":"Energy-aware application scheduling on a heterogeneous multi-core system","authors":"Jing Chen, L. John","doi":"10.1109/IISWC.2008.4636086","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636086","url":null,"abstract":"Heterogeneous multi-core processors are attractive for power efficient computing because of their ability to meet varied resource requirements of diverse applications in a workload. However, one of the challenges of using a heterogeneous multi-core processor is to schedule different programs in a workload to matching cores that can deliver the most efficient program execution. This paper presents an energy-aware scheduling mechanism that employs fuzzy logic to calculate the suitability between programs and cores by analyzing important inherent program characteristics such as instruction dependency distance and branch transition rate. The obtained suitability is then used to guide the program scheduling in the heterogeneous multi-core system. The experimental results show that the proposed suitability-guided program scheduling mechanism achieves up to 15.0% average reduction in energy-delay product compared with that of the random scheduling approach. To the best of our knowledge, this study is the first to apply fuzzy logic to schedule programs in heterogeneous multi-core systems.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130880195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors PARSEC与SPLASH-2:在chip - multiprocessor上对两个多线程基准套件进行定量比较
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636090
Christian Bienia, Sanjeev Kumar, Kai Li
{"title":"PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors","authors":"Christian Bienia, Sanjeev Kumar, Kai Li","doi":"10.1109/IISWC.2008.4636090","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636090","url":null,"abstract":"The PARSEC benchmark suite was recently released and has been adopted by a significant number of users within a short amount of time. This new collection of workloads is not yet fully understood by researchers. In this study we compare the SPLASH-2 and PARSEC benchmark suites with each other to gain insights into differences and similarities between the two program collections. We use standard statistical methods and machine learning to analyze the suites for redundancy and overlap on chip-multiprocessors (CMPs). Our analysis shows that PARSEC workloads are fundamentally different from SPLASH-2 benchmarks. The observed differences can be explained with two technology trends, the proliferation of CMPs and the accelerating growth of world data.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130939139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 238
Whiteboards that compute: A workload analysis 计算的白板:工作负载分析
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636092
Ryan Dixon, T. Sherwood
{"title":"Whiteboards that compute: A workload analysis","authors":"Ryan Dixon, T. Sherwood","doi":"10.1109/IISWC.2008.4636092","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636092","url":null,"abstract":"A whiteboard that automatically identifies drawn strokes, interprets them in context, and augments drawn images with computational results, such as solutions to mathematical equations or results of circuit simulations, is a surprisingly realistic goal for systems architects. In this paper we describe the state of this emerging domain and argue that technical trends will make this a particularly attractive workload in the future. We provide a preliminary characterization of the critical loops that exist within one state of the art system, currently undergoing development, and we attempt to quantify the workloads that whiteboard-sized devices are likely to face in the future. While this work is by no means a typical workload characterization paper, given the shift in programming models that we are about to endure, it is now more important than ever before to identify and understand those applications that have the potential to drive our industry forward.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133479075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Characterization of storage workload traces from production Windows Servers 对来自生产Windows服务器的存储工作负载跟踪进行表征
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636097
Swaroop Kavalanekar, Bruce L. Worthington, Qi Zhang, Vishal Sharda
{"title":"Characterization of storage workload traces from production Windows Servers","authors":"Swaroop Kavalanekar, Bruce L. Worthington, Qi Zhang, Vishal Sharda","doi":"10.1109/IISWC.2008.4636097","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636097","url":null,"abstract":"The scarcity of publicly available storage workload traces of production servers impairs characterization, modeling research, and development efforts across the storage industry. Twelve sets of storage traces from a diverse set of Microsoft Corporation production servers were captured using ETW (event tracing for windows) instrumentation. Windows server 2008 dramatically increases the breadth and depth of ETW instrumentation, and new trace capture and visualization tools are available in the Windows Performance Tools kit. Additional analytical tools were developed to analyze and visualize traces captured from Exchange, software build and release, Live Maps, MSN storage, security authentication, and display advertisement platform servers. This paper contains a first set of characterizations for these traces, including simple block-level statistics, multi-parameter distributions, rankings of file access frequencies, and more complex analyses such as temporal and spatial self-similarity measurements. Trace data visualizations enable the examination of workload parameters, subcomponents, phases, and deviations from predicted behavior.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131549616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 271
We have it easy, but do we have it right? 我们做得很容易,但我们做得对吗?
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536408
Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, P. Sweeney
{"title":"We have it easy, but do we have it right?","authors":"Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, P. Sweeney","doi":"10.1109/IPDPS.2008.4536408","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536408","url":null,"abstract":"Summary form only given. To evaluate an innovation in computer systems, performance analysts measure execution time or other metrics using one or more standard workloads. The performance analyst may carefully minimize the amount of measurement instrumentation, control the environment in which measurement takes place, and repeat each measurement multiple times. Finally, the performance analyst may use statistical techniques to characterize the data. Unfortunately, even with such a responsible approach, the collected data may be misleading due to measurement bias and observer effect. Measurement bias occurs when the experimental setup inadvertently favors a particular outcome. Observer effect occurs if data collection alters the behavior of the system being measured. This talk demonstrates that observer effect and measurement bias are (i) large enough to mislead performance analysts; and (ii) common enough that they cannot be ignored. While these phenomenon are well known to the natural and social sciences this talk will demonstrate that research in computer systems typically does not take adequate measures to guard against measurement bias and observer effect.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116471413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信