2006 IEEE International Symposium on Workload Characterization最新文献

筛选
英文 中文
Evaluating Benchmark Subsetting Approaches 评估基准子集方法
2006 IEEE International Symposium on Workload Characterization Pub Date : 2006-12-01 DOI: 10.1109/IISWC.2006.302733
J. Yi, Resit Sendag, L. Eeckhout, A. Joshi, D. Lilja, L. John
{"title":"Evaluating Benchmark Subsetting Approaches","authors":"J. Yi, Resit Sendag, L. Eeckhout, A. Joshi, D. Lilja, L. John","doi":"10.1109/IISWC.2006.302733","DOIUrl":"https://doi.org/10.1109/IISWC.2006.302733","url":null,"abstract":"To reduce the simulation time to a tractable amount or due to compilation (or other related) problems, computer architects often simulate only a subset of the benchmarks in a benchmark suite. However, if the architect chooses a subset of benchmarks that is not representative, the subsequent simulation results will, at best, be misleading or, at worst, yield incorrect conclusions. To address this problem, computer architects have recently proposed several statistically-based approaches to subset a benchmark suite. While some of these approaches are well-grounded statistically, what has not yet been thoroughly evaluated is the: 1) absolute accuracy; 2) relative accuracy across a range of processor and memory subsystem enhancements; and 3) representativeness and coverage of each approach for a range of subset sizes. Specifically, this paper evaluates statistically-based subsetting approaches based on principal components analysis (PCA) and the Plackett and Burman (P&B) design, in addition to prevailing approaches such as integer vs. floating-point, core vs. memory-bound, by language, and at random. Our results show that the two statistically-based approaches, PCA and P&B, have the best absolute and relative accuracy for CPI and energy-delay product (EDP), produce subsets that are the most representative, and choose benchmark and input set pairs that are most well-distributed across the benchmark space. To achieve a 5% absolute CPI and EDP error, across a wide range of configurations, PCA and P&B typically need about 17 benchmark and input set pairs, while the other five approaches often choose more than 30 benchmark and input set pairs","PeriodicalId":222041,"journal":{"name":"2006 IEEE International Symposium on Workload Characterization","volume":"399 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123527634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
An Architectural Characterization Study of Data Mining and Bioinformatics Workloads 数据挖掘和生物信息学工作负载的架构表征研究
2006 IEEE International Symposium on Workload Characterization Pub Date : 2006-10-25 DOI: 10.1109/IISWC.2006.302730
Berkin Özisikyilmaz, R. Narayanan, Joseph Zambreno, G. Memik, A. Choudhary
{"title":"An Architectural Characterization Study of Data Mining and Bioinformatics Workloads","authors":"Berkin Özisikyilmaz, R. Narayanan, Joseph Zambreno, G. Memik, A. Choudhary","doi":"10.1109/IISWC.2006.302730","DOIUrl":"https://doi.org/10.1109/IISWC.2006.302730","url":null,"abstract":"Data mining is the process of automatically finding implicit, previously unknown, and potentially useful information from large volumes of data. Advances in data extraction techniques have resulted in tremendous increase in the input data size of data mining applications. Data mining systems, on the other hand, have been unable to maintain the same rate of growth. Therefore, there is an increasing need to understand the bottlenecks associated with the execution of these applications in modern architectures. In this paper, we present MineBench, a publicly available benchmark suite containing fifteen representative data mining applications belonging to various categories: classification, clustering, association rule mining and optimization. First, we highlight the uniqueness of data mining applications. Subsequently, we evaluate the MineBench applications on an 8-way shared memory (SMP) machine and analyze important performance characteristics such as L1 and L2 cache miss rates, branch misprediction rates","PeriodicalId":222041,"journal":{"name":"2006 IEEE International Symposium on Workload Characterization","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116487363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
"Software Performance Tuning with the Apple CHUD Tools" “使用Apple CHUD工具进行软件性能调优”
2006 IEEE International Symposium on Workload Characterization Pub Date : 2006-10-01 DOI: 10.1109/IISWC.2006.302722
R. Altherr, R. D. Bois, L. Hammond, Eric Miller
{"title":"\"Software Performance Tuning with the Apple CHUD Tools\"","authors":"R. Altherr, R. D. Bois, L. Hammond, Eric Miller","doi":"10.1109/IISWC.2006.302722","DOIUrl":"https://doi.org/10.1109/IISWC.2006.302722","url":null,"abstract":"Summary form only given. Many tools have been created to allow software engineers to analyze the execution of their code. While tools such as gprof often work well, most are not integrated very well with each other or the rest of the development environment, and interpreting the data that they provide can be a challenge. Because Apple's MacOS X is based on UNIX, most open source performance analysis tools can be used. However, we have also integrated several key performance tools together and added graphical data visualization to produce the CHUD toolset (Available for at http://developer.apple.com/tools/download/). With the CHUD tools, programmers can examine the performance of their code using a set of integrated tools that can perform most common performance-measurement tasks, including: traces of function call behavior (like gprof); sampled measurements of program execution timing; traces of software events, such as system calls; and hardware event counter measurements; Moreover, instead of just presenting a few key figures from these measurements in a brief report, the CHUD tools present their results in several textual and graphical formats, with integrated hyperlinks to related assembly and source code, so that programmers can easily examine both how their programs work on a large-scale level or zoom in and look at individual program phases in several different ways. This tutorial is targeted primarily at students and software engineers who work on UNIX-based systems and want to expand the repertoire of tools that they can use to analyze and improve the performance of their code. However, the material should also be useful to educators who teach performance-oriented programming techniques, as the graphical nature of Shark's output makes it easy to demonstrate program behaviors in an eye-catching manner","PeriodicalId":222041,"journal":{"name":"2006 IEEE International Symposium on Workload Characterization","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115823278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Workload Characterization of 3D Games 3D游戏的工作量特征
2006 IEEE International Symposium on Workload Characterization Pub Date : 2006-10-01 DOI: 10.1109/IISWC.2006.302726
Jordi Roca, Victor Moya Del Barrio, Carlos González, C. Solis, Agustín Fernández, R. Espasa
{"title":"Workload Characterization of 3D Games","authors":"Jordi Roca, Victor Moya Del Barrio, Carlos González, C. Solis, Agustín Fernández, R. Espasa","doi":"10.1109/IISWC.2006.302726","DOIUrl":"https://doi.org/10.1109/IISWC.2006.302726","url":null,"abstract":"The rapid pace of change in 3D game technology makes workload characterization necessary for every game generation. Comparing to CPU characterization, far less quantitative information about games is available. This paper focuses on analyzing a set of modern 3D games at the API call level and at the micro architectural level using the Attila simulator. In addition to common geometry metrics and, in order to understand tradeoffs in modern GPUs, the microarchitectural level metrics allow us to analyze performance key characteristics such as the balance between texture and ALU instructions in fragment programs, dynamic anisotropic ratios, vertex, z-stencil, color and texture cache performance","PeriodicalId":222041,"journal":{"name":"2006 IEEE International Symposium on Workload Characterization","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124866550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics 使用关键的与微体系结构无关的特征比较基准测试
2006 IEEE International Symposium on Workload Characterization Pub Date : 2006-10-01 DOI: 10.1109/IISWC.2006.302732
Kenneth Hoste, L. Eeckhout
{"title":"Comparing Benchmarks Using Key Microarchitecture-Independent Characteristics","authors":"Kenneth Hoste, L. Eeckhout","doi":"10.1109/IISWC.2006.302732","DOIUrl":"https://doi.org/10.1109/IISWC.2006.302732","url":null,"abstract":"Understanding the behavior of emerging workloads is important for designing next generation microprocessors. For addressing this issue, computer architects and performance analysts build benchmark suites of new application domains and compare the behavioral characteristics of these benchmark suites against well-known benchmark suites. Current practice typically compares workloads based on microarchitecture-dependent characteristics generated from running these workloads on real hardware. There is one pitfall though with comparing benchmarks using microarchitecture-dependent characteristics, namely that completely different inherent program behavior may yield similar microarchitecture-dependent behavior. This paper proposes a methodology for characterizing benchmarks based on microarchitecture-independent characteristics. This methodology minimizes the number of inherent program characteristics that need to be measured by exploiting correlation between program characteristics. In fact, we reduce our 47-dimensional space to an 8-dimensional space without compromising the methodology's ability to compare benchmarks. The important benefits of this methodology are that (i) only a limited number of microarchitecture-independent characteristics need to be measured, and (ii) the resulting workload characterization is easy to interpret. Using this methodology we compare 122 benchmarks from 6 recently proposed benchmark suites. We conclude that some benchmarks in emerging benchmark suites are indeed similar to benchmarks from well-known benchmark suites as suggested through a microarchitecture-dependent characterization. However, other benchmarks are dissimilar based on a microarchitecture-independent characterization although a microarchitecture-dependent characterization suggests the opposite to be true","PeriodicalId":222041,"journal":{"name":"2006 IEEE International Symposium on Workload Characterization","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130184465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Load Instruction Characterization and Acceleration of the BioPerf Programs 负载指令表征及BioPerf程序的加速
2006 IEEE International Symposium on Workload Characterization Pub Date : 2006-10-01 DOI: 10.1109/IISWC.2006.302731
P. Ratanaworabhan, Martin Burtscher
{"title":"Load Instruction Characterization and Acceleration of the BioPerf Programs","authors":"P. Ratanaworabhan, Martin Burtscher","doi":"10.1109/IISWC.2006.302731","DOIUrl":"https://doi.org/10.1109/IISWC.2006.302731","url":null,"abstract":"The load instructions of some of the bioinformatics applications in the BioPerf suite possess interesting characteristics: only a few static loads cover almost the entire dynamic load execution and they almost always hit in the data cache. Nevertheless, these load instructions represent a major performance bottleneck. They often precede or follow branches that are hard to predict, which makes their L1 hit latency difficult to hide even in dynamically scheduled execution cores. This paper investigates this behavior and suggests simple source-code transformations to improve the performance of these benchmark programs by up to 92%","PeriodicalId":222041,"journal":{"name":"2006 IEEE International Symposium on Workload Characterization","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127680503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
"Evolve or Die: Making SPEC's CPU Suite Relevant Today and Tomorrow" “要么进化,要么死亡:让SPEC的CPU套件与今天和明天相关”
2006 IEEE International Symposium on Workload Characterization Pub Date : 2006-10-01 DOI: 10.1109/IISWC.2006.302735
J. Reilly
{"title":"\"Evolve or Die: Making SPEC's CPU Suite Relevant Today and Tomorrow\"","authors":"J. Reilly","doi":"10.1109/IISWC.2006.302735","DOIUrl":"https://doi.org/10.1109/IISWC.2006.302735","url":null,"abstract":"Biography: Jeff Reilly is a Principal Engineer at Intel Corporation. For the past 16+ years, he has been involved in many aspects of computer benchmarking (development, measurement, analysis, projection), including working with several industry consortiums. In particular, Mr. Reilly has chaired the SPEC CPU Subcommittee for close to 15 years, enjoying the opportunity to work with many talented people in shepherding SPEC’s CPU suite through development, including the recently announced CPU2006.","PeriodicalId":222041,"journal":{"name":"2006 IEEE International Symposium on Workload Characterization","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121317656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
DFS: A Simple to Write Yet Difficult to Execute Benchmark DFS:一个易于编写但难以执行的基准测试
2006 IEEE International Symposium on Workload Characterization Pub Date : 2006-10-01 DOI: 10.1109/IISWC.2006.302741
R. Murphy, Jonathan W. Berry, William C. McLendon, B. Hendrickson, Douglas P. Gregor, A. Lumsdaine
{"title":"DFS: A Simple to Write Yet Difficult to Execute Benchmark","authors":"R. Murphy, Jonathan W. Berry, William C. McLendon, B. Hendrickson, Douglas P. Gregor, A. Lumsdaine","doi":"10.1109/IISWC.2006.302741","DOIUrl":"https://doi.org/10.1109/IISWC.2006.302741","url":null,"abstract":"Many emerging applications are built upon large, unstructured datasets that exhibit highly irregular (or even nearly random) memory access patterns. Examples include informatics applications, and other problems that are often represented by unstructured graph-based data structures. It is well known that these applications are challenging for conventional architectures to execute (either serially or in parallel). The depth first search (DFS) benchmark proposed in this work uses the boost graph library to perform a depth-first search on a large power-law graph, representing \"small world\" phenomena. The graph in question exhibits a small average distance between any two vertices, a small diameter, and has a few high-degree vertices with a large number of low-degree vertices. Graphs such as this appear in many fields, including networking, biology, social networks, and data mining. Many of these applications are of critical importance to researchers, and the challenge of executing them on conventional machines increases as the graph size grows. The benchmark proposed in this work is used as the basis for many fundamental algorithms in graph theory, is critical to several emerging applications, is memory intensive, and exhibits poor performance on conventional machines. Section 2 quantitatively demonstrates the memory characteristics of the benchmark in an architecture independent fashion, showing that it is extremely memory intensive. Section 3 describes the execution phases of the benchmark. And section 4 presents the conclusions","PeriodicalId":222041,"journal":{"name":"2006 IEEE International Symposium on Workload Characterization","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115456932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Exploring Small-Scale and Large-Scale CMP Architectures for Commercial Java Servers 探索商业Java服务器的小规模和大规模CMP体系结构
2006 IEEE International Symposium on Workload Characterization Pub Date : 2006-10-01 DOI: 10.1109/IISWC.2006.302744
R. Iyer, M. Bhat, Li Zhao, R. Illikkal, S. Makineni, Michael Jones, K. Shiv, D. Newell
{"title":"Exploring Small-Scale and Large-Scale CMP Architectures for Commercial Java Servers","authors":"R. Iyer, M. Bhat, Li Zhao, R. Illikkal, S. Makineni, Michael Jones, K. Shiv, D. Newell","doi":"10.1109/IISWC.2006.302744","DOIUrl":"https://doi.org/10.1109/IISWC.2006.302744","url":null,"abstract":"As we enter the era of chip multiprocessor (CMP) architectures, it is important that we explore the scaling characteristics of mainstream server workloads on these platforms. In this paper, we analyze the performance of an Enterprise Java workload (SPECjbb2005) on two important classes of CMP architectures. One class of CMP platforms comprise of \"small-scale\" CMP (SCMP) processors with a few large out-of order cores on the die. Another class of CMP platforms comprise of \"large-scale\" CMP (LCMP) processors) with several small in-order cores on the die. For these classes of CMP architectures to succeed, it is important that there are sufficient resources (cache, memory and interconnect) to allow for a balanced scalable platform. In this paper, we focus on evaluating the resource scaling characteristics (cores, caches and memory) of SPECjbb2005 on these two architectures and understanding architectural trade-offs that may be required in future CMP offerings. The overall evaluation is uniquely conducted using four different methodologies (measurements on latest platforms, trace-based cache simulation, trace-based platform simulation and execution-driven emulation). Based on our findings, we summarize the architectural recommendations for future CMP server platforms (e.g. the need for large DRAM caches)","PeriodicalId":222041,"journal":{"name":"2006 IEEE International Symposium on Workload Characterization","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129532539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Workload Characterization of a Parallel Video Mining Application on a 16-Way Shared-Memory Multiprocessor System 16路共享内存多处理器系统上并行视频挖掘应用的工作负载表征
2006 IEEE International Symposium on Workload Characterization Pub Date : 2006-10-01 DOI: 10.1109/IISWC.2006.302725
Wenlong Li, E. Li, C. Dulong, Yen-kuang Chen, Tao Wang, Yimin Zhang
{"title":"Workload Characterization of a Parallel Video Mining Application on a 16-Way Shared-Memory Multiprocessor System","authors":"Wenlong Li, E. Li, C. Dulong, Yen-kuang Chen, Tao Wang, Yimin Zhang","doi":"10.1109/IISWC.2006.302725","DOIUrl":"https://doi.org/10.1109/IISWC.2006.302725","url":null,"abstract":"As video data become more and more pervasive, mining information from multimedia data sources becomes increasingly important, e.g., automatically extracting highlights from soccer game video content. However, the huge computation requirement of mining interested data limits its wide use in practice. Since the hardware imperative behind computer architecture is shifting from uniprocessors to multi-core processors, exploiting thread-level parallelism existing in multimedia mining applications is critical to utilizing the hardware resources and accelerating the complex processing of highlight events detection. In this paper we analyze the view type and playfield detection application, a widely used application in sports video mining systems, and we present several different schemes (task level, data-slicing-level, and a hybrid parallel scheme, as well as variations of the hybrid parallel scheme) for parallelizing this application. The hybrid parallel scheme, which exploits data-level and task-slicing-level parallelism, outperforms basic task-level and data-slicing-level schemes, delivering much better performance in terms of execution time and speedup. On a 16-way shared-memory multi-processing system with hardware prefetch enabled, the hybrid scheme achieves a speedup of 10.6x. Detailed performance analysis shows that because of the large working set, the workload often requires data from the off-chip memory. Therefore, the saturated bus bandwidth utilization is the likely cause of bottlenecks for achieving perfect scalability performance. With hardware prefetch enabled, the bus utilization rate on 16-processors system is about 76% for the hybrid scheme, and the projected bus bandwidth requirement for perfect scalability is about 3.1GB/s for 16 processors and 6.2 GB/s for 32 processors. In addition, our experiments also reveal that there are also no obvious scaling limiting factors, e.g., very low synchronization and load imbalance problems even with up to 16 processors","PeriodicalId":222041,"journal":{"name":"2006 IEEE International Symposium on Workload Characterization","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130103505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信