2008 IEEE International Symposium on Workload Characterization最新文献_第2页

Implications of cache asymmetry on server consolidation performance 缓存不对称对服务器整合性能的影响

2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636088

P. Apparao, R. Iyer, D. Newell

{"title":"Implications of cache asymmetry on server consolidation performance","authors":"P. Apparao, R. Iyer, D. Newell","doi":"10.1109/IISWC.2008.4636088","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636088","url":null,"abstract":"Todaypsilas CMP platforms are designed to be symmetric in terms of platform resources such as shared caches. However, it is becoming increasingly important to understand the performance implications of asymmetric caches for two key reasons: (a) multi-workload scenarios such as server consolidation are a growing trend and contention for shared cache resources between workloads causes logical cache asymmetry, (b) future CMP platforms may be designed to be physically asymmetric in hardware due to die area pressure, process variability or power/performance efficiency. Our focus in this paper is to understand the performance implications of both logical as well as physical asymmetric caches on server consolidation. Based on real measurements of a state-of-the-art CMP processor running a server consolidation benchmark (vConsolidate) we compare the performance implications as a function of (a) symmetric caches, (b) virtually asymmetric caches, (c) physically asymmetric caches and (d) a combination of logically and physically asymmetric caches. We analyze the performance behavior in terms of (i) performance of each of the individual workloads being consolidated and (ii) architectural components such as CPI and MPI. We believe that this asymmetric cache study is the first of its kind and provides useful data/insights on cache characteristics of server consolidation. We also present inferences on future optimizations in the VMM scheduler as well as potential hardware techniques for future CMPs with cache asymmetry.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121859572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Parallelization and characterization of SIFT on multi-core systems SIFT在多核系统上的并行化与表征

2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636087

Hao Feng, E. Li, Yurong Chen, Yimin Zhang

{"title":"Parallelization and characterization of SIFT on multi-core systems","authors":"Hao Feng, E. Li, Yurong Chen, Yimin Zhang","doi":"10.1109/IISWC.2008.4636087","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636087","url":null,"abstract":"This paper parallelizes and characterizes an important computer vision application -Scale Invariant Feature Transform (SIFT) both on a Symmetric Multiprocessor (SMP) platform and a large scale Chip Multiprocessor (CMP) simulator. SIFT is an approach for extracting distinctive invariant features from images and has been widely applied. In many computer vision problems, a real-time or even super-real-time processing capability of SIFT is required. To meet the computation demand, we optimize and parallelize SIFT to accelerate its execution on multi-core systems. Our study shows that SIFT can achieve a 9.7x ~ llx speedup on a 16 -core SMP system. Furthermore, Single Instruction Multiple Data (SIMD) and cache-conscious optimization bring another 85% performance gain at most. But it is still three times slower than the real-time requirement for High-Definition Television (HDTV) image. Then we study the performance of SIFT on a 64 -core CMP simulator. The results show that for HDTV image, SIFT can achieve an excellent speedup of 52 x and run in real-time finally. Besides the parallelization and optimization work, we also conduct a detailed performance analysis for SIFT on those two platforms. We find that load imbalance significantly limits the scalability and SIFT suffers from intensive burst memory bandwidth requirement on the 16 -core SMP system. However, on the 64 -core CMP simulator the memory pressure is not high due to the shared last-level cache (LLC) which accommodates tremendous read-write sharing in SIFT. Thus it does not affect the scaling performance. In short, understanding the characterization of SIFT can help identify the program bottlenecks and give us further insights into designing better systems.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125314922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

Can hardware performance counters be trusted? 硬件性能计数器可信吗?

2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636099

Vincent M. Weaver, S. Mckee

{"title":"Can hardware performance counters be trusted?","authors":"Vincent M. Weaver, S. Mckee","doi":"10.1109/IISWC.2008.4636099","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636099","url":null,"abstract":"When creating architectural tools, it is essential to know whether the generated results make sense. Comparing a toolpsilas outputs against hardware performance counters on an actual machine is a common means of executing a quick sanity check. If the results do not match, this can indicate problems with the tool, unknown interactions with the benchmarks being investigated, or even unexpected behavior of the real hardware. To make future analyses of this type easier, we explore the behavior of the SPEC benchmarks with both dynamic binary instrumentation (DBI) tools and hardware counters. We collect retired instruction performance counter data from the full SPEC CPU 2000 and 2006 benchmark suites on nine different implementations of the times86 architecture. When run with no special preparation, hardware counters have a coefficient of variation of up to 1.07%. After analyzing results in depth, we find that minor changes to the experimental setup reduce observed errors to less than 0.002% for all benchmarks. The fact that subtle changes in how experiments are conducted can largely impact observed results is unexpected, and it is important that researchers using these counters be aware of the issues involved.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124368377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 105

Accelerating multi-core processor design space evaluation using automatic multi-threaded workload synthesis 使用自动多线程工作负载合成加速多核处理器设计空间评估

2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636101

C. Hughes, Tao Li

{"title":"Accelerating multi-core processor design space evaluation using automatic multi-threaded workload synthesis","authors":"C. Hughes, Tao Li","doi":"10.1109/IISWC.2008.4636101","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636101","url":null,"abstract":"The design and evaluation of microprocessor architectures is a difficult and time-consuming task. Although small, hand-coded microbenchmarks can be used to accelerate performance evaluation, these programs lack the complexity to stress increasingly complex architecture designs. Larger and more complex real-world workloads should be employed to measure the performance of a given design or to evaluate the efficiency of various design alternatives. These applications can take days or weeks if run to completion on a detailed architecture simulator. In the past, researchers have applied machine learning and statistical sampling methods to reduce the average number of instructions required for detailed simulation. Others have proposed statistical simulation and workload synthesis techniques, which can produce programs that emulate the execution characteristics of the application from which they are derived but have a much shorter execution period than the original. However, these existing methods are difficult to apply to multi-threaded programs and can result in simplifications that miss the complex interactions between multiple, concurrently running threads. This study focuses on developing new techniques for accurate and effective multi-threaded workload synthesis, which can significantly accelerate architecture design evaluation of multi-core processors. We propose to construct synchronized statistical flow graphs that incorporate inter-thread synchronization and sharing behavior to capture the complex characteristics and interactions of multiple threads. Moreover, we develop thread-aware data reference models and wavelet-based branching models to generate accurate memory access and dynamic branch statistics. Experimental results show that a framework integrated with the aforementioned models can automatically generate synthetic programs that maintain characteristics of original workloads but have significantly reduced runtime.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115371639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Energy-aware application scheduling on a heterogeneous multi-core system 异构多核系统上的能量感知应用程序调度

2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636086

Jing Chen, L. John

引用次数: 23

PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors PARSEC与SPLASH-2:在chip - multiprocessor上对两个多线程基准套件进行定量比较

2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636090

Christian Bienia, Sanjeev Kumar, Kai Li

引用次数: 238

Whiteboards that compute: A workload analysis 计算的白板:工作负载分析

2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636092

Ryan Dixon, T. Sherwood

引用次数: 5

Characterization of storage workload traces from production Windows Servers 对来自生产Windows服务器的存储工作负载跟踪进行表征

2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636097

Swaroop Kavalanekar, Bruce L. Worthington, Qi Zhang, Vishal Sharda

{"title":"Characterization of storage workload traces from production Windows Servers","authors":"Swaroop Kavalanekar, Bruce L. Worthington, Qi Zhang, Vishal Sharda","doi":"10.1109/IISWC.2008.4636097","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636097","url":null,"abstract":"The scarcity of publicly available storage workload traces of production servers impairs characterization, modeling research, and development efforts across the storage industry. Twelve sets of storage traces from a diverse set of Microsoft Corporation production servers were captured using ETW (event tracing for windows) instrumentation. Windows server 2008 dramatically increases the breadth and depth of ETW instrumentation, and new trace capture and visualization tools are available in the Windows Performance Tools kit. Additional analytical tools were developed to analyze and visualize traces captured from Exchange, software build and release, Live Maps, MSN storage, security authentication, and display advertisement platform servers. This paper contains a first set of characterizations for these traces, including simple block-level statistics, multi-parameter distributions, rankings of file access frequencies, and more complex analyses such as temporal and spatial self-similarity measurements. Trace data visualizations enable the examination of workload parameters, subcomponents, phases, and deviations from predicted behavior.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131549616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 271

We have it easy, but do we have it right? 我们做得很容易，但我们做得对吗?

2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-04-14 DOI: 10.1109/IPDPS.2008.4536408

Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, P. Sweeney

{"title":"We have it easy, but do we have it right?","authors":"Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, P. Sweeney","doi":"10.1109/IPDPS.2008.4536408","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536408","url":null,"abstract":"Summary form only given. To evaluate an innovation in computer systems, performance analysts measure execution time or other metrics using one or more standard workloads. The performance analyst may carefully minimize the amount of measurement instrumentation, control the environment in which measurement takes place, and repeat each measurement multiple times. Finally, the performance analyst may use statistical techniques to characterize the data. Unfortunately, even with such a responsible approach, the collected data may be misleading due to measurement bias and observer effect. Measurement bias occurs when the experimental setup inadvertently favors a particular outcome. Observer effect occurs if data collection alters the behavior of the system being measured. This talk demonstrates that observer effect and measurement bias are (i) large enough to mislead performance analysts; and (ii) common enough that they cannot be ignored. While these phenomenon are well known to the natural and social sciences this talk will demonstrate that research in computer systems typically does not take adequate measures to guard against measurement bias and observer effect.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116471413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19