2008 IEEE International Symposium on Workload Characterization最新文献

筛选
英文 中文
Reproducible simulation of multi-threaded workloads for architecture design exploration 用于架构设计探索的多线程工作负载的可重复模拟
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636102
C. Pereira, H. Patil, B. Calder
{"title":"Reproducible simulation of multi-threaded workloads for architecture design exploration","authors":"C. Pereira, H. Patil, B. Calder","doi":"10.1109/IISWC.2008.4636102","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636102","url":null,"abstract":"As multiprocessors become mainstream, techniques to address efficient simulation of multi-threaded workloads are needed. Multi-threaded simulation presents a new challenge: non-determinism across simulations for different architecture configurations. If the execution paths between two simulation runs of the same benchmark with the same input are too different, the simulation results cannot be used to compare the configurations. In this paper we focus on a simulation technique to efficiently collect simulation checkpoints for multi-threaded workloads, and to compare simulation runs addressing this non-determinism problem. We focus on user-level simulation of multi-threaded workloads for multiprocessor architectures. We present an approach, based on binary instrumentation, to collect checkpoints for simulation. Our checkpoints allow reproducible execution of the samples across different architecture configurations by controlling the sources of nondeterminism during simulation. This results in stalls that would not naturally occur in execution. We propose techniques that allow us to accurately compare performance across architecture configurations in the presence of these stalls.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128873957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Evaluating the impact of dynamic binary translation systems on hardware cache performance 评估动态二进制转换系统对硬件缓存性能的影响
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636098
Arkaitz Ruiz-Alvarez, K. Hazelwood
{"title":"Evaluating the impact of dynamic binary translation systems on hardware cache performance","authors":"Arkaitz Ruiz-Alvarez, K. Hazelwood","doi":"10.1109/IISWC.2008.4636098","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636098","url":null,"abstract":"Dynamic binary translation systems enable a wide range of applications such as program instrumentation, optimization, and security. DBTs use a software code cache to store previously translated instructions. The code layout in the code cache greatly differs from the code layout of the original program. This paper provides an exhaustive analysis of the performance of the instruction/trace cache and other structures of the micro-architecture while executing DBTs that focus on program instrumentation, such as DynamoRIO and Pin. We performed our evaluation along two axes. First, we directly accessed the hardware performance counters to determine actual cache miss counts. Second, we used simulation to analyze the spatial locality of the translated application. Our results show that when executing an application under the control of Pin or DynamoRIO, the icache miss counts actually increase over 2X. Surprisingly, the L2 cache and the L1 data cache show a much lower performance degradation or even break even with the native application. We also found that overall performance degradations are due to the instructions added by the DBT itself, and that these extra instructions outweigh any possible spatial locality benefits exhibited in the code cache. Our observations held regardless of the trace length, code cache size, or the presence of a hardware trace cache. These results provide a better understanding of the efficiency of current instrumentation tools and their effects on instruction/trace cache performance and other structures of the microarchitecture.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115641507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Wild speculation on consumer workloads in 2010–2020 对2010-2020年消费者工作量的疯狂猜测
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636084
Tim Sweeney
{"title":"Wild speculation on consumer workloads in 2010–2020","authors":"Tim Sweeney","doi":"10.1109/IISWC.2008.4636084","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636084","url":null,"abstract":"Summary form only given. Games are among the most performance-intensive consumer applications, and often lead the way in bringing research technologies into practice. This occasionally leads to non-evolutionary leaps in performance and workload characteristics, such as the 1000-fold increase in 3D throughput enabled by consumer graphics accelerators beginning in 1998. The speaker will argue that another revolution in consumer computing performance is on the horizon, driven by large-scale multi-core CPUs with vector-processing extensions inspired by todaypsilas graphics processors (GPUs). He will present a view of the key problems and solutions facing consumer software developers in 2010-2020, and speculate on the shape and scale of workloads in that timeframe. The essential questions to cover are: What portions of an application can scale effectively to many cores and vector processors? How and when can concurrency research bring techniques like functional programming, software transactional memory, and vectorization into mainstream practice?","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131733190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the representativeness of embedded Java benchmarks 关于嵌入式Java基准的代表性
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636100
C. Isen, L. John, Jung-Pil Choi, H. Song
{"title":"On the representativeness of embedded Java benchmarks","authors":"C. Isen, L. John, Jung-Pil Choi, H. Song","doi":"10.1109/IISWC.2008.4636100","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636100","url":null,"abstract":"Java has become one of the predominant languages for embedded and mobile platforms due to its architecturally neutral design, portability, and security. But Java execution in the embedded world encompasses Java virtual machines (JVMs) specially tuned for the embedded world, with stripped-down capabilities, and configurations for memory-limited environments. While there have been some studies on desktop and server Java, there have been very few studies on embedded Java. The non proliferation of embedded Java benchmarks and the lack of widespread profiling tools and simulators have only exacerbated the problem. While the industry uses some benchmarks such as MorphMark, MIDPMark, and EEMBC Java Grinder Bench, their representativeness in comparison to actual embedded Java applications has not been studied. In order to conduct such a study, we gathered an actual mobile phone application suite and characterized it in detail. We measure several properties of the various applications and benchmarks, perform similarity/dissimilarity analysis and shed light on the representativeness of current industry standard embedded benchmarks against actual mobile Java applications. It was observed that for many characteristics, the applications had a broader range, indicating that the benchmarks were under representing the range of characteristics in the real world. Furthermore, we find that the applications exhibit less code reuse/hotness compared to the benchmarks. We also draw comparisons of the embedded benchmarks against popular desktop/client Java benchmarks, such as the SPECjvm98 and DaCapo. Interestingly, embedded applications spend a significant amount of time in standard library code, on average 65%, suggesting to the usefulness of software and hardware techniques to facilitate pre-compilation with out the real time resource overhead of JIT.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"19 15","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120813740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Temporal streams in commercial server applications 商业服务器应用程序中的时间流
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636095
T. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, Andreas Moshovos
{"title":"Temporal streams in commercial server applications","authors":"T. Wenisch, M. Ferdman, A. Ailamaki, B. Falsafi, Andreas Moshovos","doi":"10.1109/IISWC.2008.4636095","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636095","url":null,"abstract":"Commercial server applications remain memory bound on modern multiprocessor systems because of their large data footprints, frequent sharing, complex non-strided access patterns, and long chains of dependant misses. To improve memory system performance despite these challenging access patterns, researchers have proposed prefetchers that exploit temporal streams-recurring sequences of memory accesses. Although prior studies show substantial performance improvement from such schemes, they fail to explain why temporal streams arise; that is, they treat commercial applications as a black box and do not identify the specific behaviors that lead to recurring miss sequences. In this paper, we perform an information-theoretic analysis of miss traces from single-chip and multi-chip multiprocessors to identify recurring temporal streams in web serving, online transaction processing, and decision support workloads. Then, using function names embedded in the application binaries and Solaris kernel, we identify the code modules and behaviors that give rise to temporal streams.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127073229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Characterizing and improving the performance of Intel Threading Building Blocks 表征和改进英特尔线程构建块的性能
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636091
Gilberto Contreras, M. Martonosi
{"title":"Characterizing and improving the performance of Intel Threading Building Blocks","authors":"Gilberto Contreras, M. Martonosi","doi":"10.1109/IISWC.2008.4636091","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636091","url":null,"abstract":"The Intel threading building blocks (TBB) runtime library is a popular C++ parallelization environment (D. Bolton, 2007) that offers a set of methods and templates for creating parallel applications. Through support of parallel tasks rather than parallel threads, the TBB runtime library offers improved performance scalability by dynamically redistributing parallel tasks across available processors. This not only creates more scalable, portable parallel applications, but also increases programming productivity by allowing programmers to focus their efforts on identifying concurrency rather than worrying about its management. While many applications benefit from dynamic management of parallelism, dynamic management carries parallelization overhead that increases with increasing core counts and decreasing task sizes. Understanding the sources of these overheads and their implications on application performance can help programmers make more efficient use of available parallelism. Clearly understanding the behavior of these overheads is the first step in creating efficient, scalable parallelization environments targeted at future CMP systems. In this paper we study and characterize some of the overheads of the Intel Threading Building Blocks through the use of real-hardware and simulation performance measurements. Our results show that synchronization overheads within TBB can have a significant and detrimental effect on parallelism performance. Random stealing, while simple and effective at low core counts, becomes less effective as application heterogeneity and core counts increase. Overall, our study provides valuable insights that can be used to create more robust, scalable runtime libraries.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133349409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 127
Empirical examination of a collaborative web application 协作式web应用程序的实证检验
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636094
Christopher Stewart, Matthew Leventi, Kai Shen
{"title":"Empirical examination of a collaborative web application","authors":"Christopher Stewart, Matthew Leventi, Kai Shen","doi":"10.1109/IISWC.2008.4636094","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636094","url":null,"abstract":"Online instructional applications, social networking sites, Wiki-based Web sites, and other emerging Web applications that rely on end users for the generation of web content are increasingly popular. However, these collaborative Web applications are still absent from the benchmark suites commonly used in the evaluation of online systems. This paper argues that collaborative Web applications are unlike traditional online benchmarks, and therefore warrant a new class of benchmarks. Specifically, request behaviors in collaborative Web applications are determined by contributions from end users, which leads to qualitatively more diverse server-side resource requirements and execution patterns compared to traditional online benchmarks. Our arguments stem from an empirical examination of WeBWorK-a widely-used collaborative Web application that allows teachers to post math or physics problems for their students to solve online. Compared to traditional online benchmarks (like TPC-C, SPECweb, and RUBiS), WeBWorK requests are harder to cluster according to their resource consumption, and they follow less regular patterns. Further, we demonstrate that the use of a WeBWorK-style benchmark would probably have led to different results in some recent research studies concerning request classification from event chains and type-based resource usage prediction.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"220 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124350076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
STAMP: Stanford Transactional Applications for Multi-Processing 斯坦福多处理事务应用程序
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636089
C. Minh, Jaewoong Chung, C. Kozyrakis, K. Olukotun
{"title":"STAMP: Stanford Transactional Applications for Multi-Processing","authors":"C. Minh, Jaewoong Chung, C. Kozyrakis, K. Olukotun","doi":"10.1109/IISWC.2008.4636089","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636089","url":null,"abstract":"Transactional Memory (TM) is emerging as a promising technology to simplify parallel programming. While several TM systems have been proposed in the research literature, we are still missing the tools and workloads necessary to analyze and compare the proposals. Most TM systems have been evaluated using microbenchmarks, which may not be representative of any real-world behavior, or individual applications, which do not stress a wide range of execution scenarios. We introduce the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems. STAMP includes eight applications and thirty variants of input parameters and data sets in order to represent several application domains and cover a wide range of transactional execution cases (frequent or rare use of transactions, large or small transactions, high or low contention, etc.). Moreover, STAMP is portable across many types of TM systems, including hardware, software, and hybrid systems. In this paper, we provide descriptions and a detailed characterization of the applications in STAMP. We also use the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125530328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1002
A workload for evaluating deep packet inspection architectures 评估深度包检测体系结构的工作负载
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636093
M. Becchi, M. Franklin, P. Crowley
{"title":"A workload for evaluating deep packet inspection architectures","authors":"M. Becchi, M. Franklin, P. Crowley","doi":"10.1109/IISWC.2008.4636093","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636093","url":null,"abstract":"High-speed content inspection of network traffic is an important new application area for programmable networking systems, and has recently led to several proposals for high-performance regular expression matching. At the same time, the number and complexity of the patterns present in well-known network intrusion detection systems has been rapidly increasing. This increase is important since both the practicality and the performance of specific pattern matching designs are strictly dependent upon characteristics of the underlying regular expression set. However, a commonly agreed upon workload for the evaluation of deep packet inspection architectures is still missing, leading to frequent unfair comparisons, and to designs lacking in generality or scalability. In this paper, we propose a workload for the evaluation of regular expression matching architectures. The workload includes a regular expression model and a traffic generator, with the former characterizing different levels of expressiveness within rule-sets and the latter characterizing varying degrees of malicious network activity. The proposed workload is used here to evaluate designs (e.g., different memory layouts and hardware organizations) where the matching algorithm is based on compressed deterministic and non deterministic finite automata (DFAs and NFAs).","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121609557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 108
Workload characterization of selected JEE-based Web 2.0 applications 选定的基于jee的Web 2.0应用程序的工作负载特性
2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI: 10.1109/IISWC.2008.4636096
P. Nagpurkar, William P. Horn, U. Gopalakrishnan, Niteesh Dubey, J. Jann, P. Pattnaik
{"title":"Workload characterization of selected JEE-based Web 2.0 applications","authors":"P. Nagpurkar, William P. Horn, U. Gopalakrishnan, Niteesh Dubey, J. Jann, P. Pattnaik","doi":"10.1109/IISWC.2008.4636096","DOIUrl":"https://doi.org/10.1109/IISWC.2008.4636096","url":null,"abstract":"Web 2.0 represents the evolution of the web from a source of information to a platform. Network advances have permitted users to migrate from desktop applications to so-called Rich Internet Applications (RIAs) characterized by thin clients, which are browser-based and store their state on managed servers. Other Web 2.0 technologies have enabled users to more easily participate, collaborate, and share in web-based communities. With the emergence of wikis, blogs, and social networking, users are no longer only consumers, they become contributors to the collective knowledge accessible on the web. In another Web 2.0 development, content aggregation is moving from portal-based technologies to more sophisticated so-called mashups where aggregation capabilities are greatly expanded. While Web 2.0 has generated a great deal of interest and discussion, there has not been much work on analyzing these emerging workloads. This paper presents a detailed characterization of several applications that exploit Web 2.0 technologies, running on an IBM Power5 system, with the goal of establishing, whether the server-side workloads generated by Web 2.0 applications are significantly different from traditional web workloads, and whether they present new challenges to underlying systems. In this paper, we present a detailed characterization of three Web 2.0 workloads, and a synthetic benchmark representing commercial workloads that do not exploit Web 2.0, for comparison.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114755669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信