2011 IEEE International Symposium on Workload Characterization (IISWC)最新文献

筛选
英文 中文
Hierarchically characterizing CUDA program behavior 分层地描述CUDA程序行为
2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114201
Zhibin Yu, Hai Jin, Nilanjan Goswami, Tao Li, L. John
{"title":"Hierarchically characterizing CUDA program behavior","authors":"Zhibin Yu, Hai Jin, Nilanjan Goswami, Tao Li, L. John","doi":"10.1109/IISWC.2011.6114201","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114201","url":null,"abstract":"CUDA has become a very popular programming paradigm in parallel computing area. However, very little work has been done for characterizing CUDA kernels. In this work, we measure the thread level performance, collect the basic block level characteristics, and glean the instruction level properties for about 35 programs from CUDA SDK, Parboil, and Rodinia benchmark suites. In addition, we define basic block vectors, synchronization vectors and thread similarity matrix to capture the characteristics of CUDA programs efficiently. We find that CUDA programs have some unique characteristics at each level compared to sequential programs.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121095500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Full-system analysis and characterization of interactive smartphone applications 交互式智能手机应用程序的全系统分析和表征
2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114205
Anthony Gutierrez, R. Dreslinski, T. Wenisch, T. Mudge, A. Saidi, C. D. Emmons, N. Paver
{"title":"Full-system analysis and characterization of interactive smartphone applications","authors":"Anthony Gutierrez, R. Dreslinski, T. Wenisch, T. Mudge, A. Saidi, C. D. Emmons, N. Paver","doi":"10.1109/IISWC.2011.6114205","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114205","url":null,"abstract":"Smartphones have recently overtaken PCs as the primary consumer computing device in terms of annual unit shipments. Given this rapid market growth, it is important that mobile system designers and computer architects analyze the characteristics of the interactive applications users have come to expect on these platforms. With the introduction of high-performance, low-power, general purpose CPUs in the latest smartphone models, users now expect PC-like performance and a rich user experience, including high-definition audio and video, high-quality multimedia, dynamic web content, responsive user interfaces, and 3D graphics. In this paper, we characterize the microarchitectural behavior of representative smartphone applications on a current-generation mobile platform to identify trends that might impact future designs. To this end, we measure a suite of widely available mobile applications for audio, video, and interactive gaming. To complete this suite we developed BBench, a new fully-automated benchmark to assess a web-browser's performance when rendering some of the most popular and complex sites on the web. We contrast these applications' characteristics with those of the SPEC CPU2006 benchmark suite. We demonstrate that real-world interactive smartphone applications differ markedly from the SPEC suite. Specifically the instruction cache, instruction TLB, and branch predictor suffer from poor performance. We conjecture that this is due to the applications' reliance on numerous high level software abstractions (shared libraries and OS services). Similar trends have been observed for UI-intensive interactive applications on the desktop.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126446551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 187
Parallelization and characterization of pattern matching using GPUs 使用gpu的模式匹配的并行化和表征
2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114181
G. Vasiliadis, M. Polychronakis, S. Ioannidis
{"title":"Parallelization and characterization of pattern matching using GPUs","authors":"G. Vasiliadis, M. Polychronakis, S. Ioannidis","doi":"10.1109/IISWC.2011.6114181","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114181","url":null,"abstract":"Pattern matching is a highly computationally intensive operation used in a plethora of applications. Unfortunately, due to the ever increasing storage capacity and link speeds, the amount of data that needs to be matched against a given set of patterns is growing rapidly. In this paper, we explore how the highly parallel computational capabilities of commodity graphics processing units (GPUs) can be exploited for high-speed pattern matching. We present the design, implementation, and evaluation of a pattern matching library running on the GPU, which can be used transparently by a wide range of applications to increase their overall performance. The library supports both string searching and regular expression matching on the NVIDIA CUDA architecture. We have also explored the performance impact of different types of memory hierarchies, and present solutions to alleviate memory congestion problems. The results of our performance evaluation using off-the-self graphics processors demonstrate that GPU-based pattern matching can reach tens of gigabits per second on different workloads.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127004314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
The Multi-Program Performance Model: Debunking current practice in multi-core simulation 多程序性能模型:揭穿当前在多核仿真中的实践
2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114194
K. V. Craeynest, L. Eeckhout
{"title":"The Multi-Program Performance Model: Debunking current practice in multi-core simulation","authors":"K. V. Craeynest, L. Eeckhout","doi":"10.1109/IISWC.2011.6114194","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114194","url":null,"abstract":"Composing a representative multi-program multi-core workload is non-trivial. A multi-core processor can execute multiple independent programs concurrently, and hence, any program mix can form a potential multi-program workload. Given the very large number of possible multi-program workloads and the limited speed of current simulation methods, it is impossible to evaluate all possible multi-program workloads.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130842690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Autocorrelation analysis: A new and improved method for branch predictability characterization 自相关分析:分支可预测性表征的一种新的改进方法
2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114179
Jing Chen, L. John
{"title":"Autocorrelation analysis: A new and improved method for branch predictability characterization","authors":"Jing Chen, L. John","doi":"10.1109/IISWC.2011.6114179","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114179","url":null,"abstract":"Branch predictability characterization not only helps to improve branch prediction but also helps to optimize predicated execution. Branch taken rate and branch transition rate have been proposed to characterize the branch predictability. However, these two metrics may misclassify branches with regular history patterns as hard-to-predict branches, causing an inaccurate and ambiguous view of branch predictability. In this paper, we utilize autocorrelation based analysis of branch history patterns and present two orthogonal metrics Degree of Pattern Irregularity (DPI) and Effective Pattern Length (EPL). Unlike the existing taken rate or transition rate, DPI directly measures the regularity of the patterns in per-address branch history, and hence is more accurate in branch classification. On the other hand, EPL reveals the optimum branch history length for the easy-to-predict branches. The proposed metrics are evaluated with PAs, GAs, and Perceptron branch predictors, and the results show that on average, DPI improves the accuracy of hard-to-predict branch classification by up to 17.7% over taken rate and 15.0% over transition rate for the workloads in this study. It is also able to identify 18.9% more easy-to-predict branches compared with taken rate and 12.8% more compared with transition rate. The proposed metrics are valuable extension to the existing metrics for accurately characterizing branch predictability.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131389112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empirical Web server power modeling and characterization 经验Web服务器功率建模和表征
2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114200
Leonardo Piga, R. Bergamaschi, F. Klein, R. Azevedo, S. Rigo
{"title":"Empirical Web server power modeling and characterization","authors":"Leonardo Piga, R. Bergamaschi, F. Klein, R. Azevedo, S. Rigo","doi":"10.1109/IISWC.2011.6114200","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114200","url":null,"abstract":"Commodity processors, which are prevalent in Internet-based data centers, do not have internal sensors for monitoring energy consumption. Such processors usually feature performance counters which can be used to indirectly estimate power consumption [1]. The usual approach in those studies is to derive linear power models based on the usage numbers collected for the processor sub-components such as caches and branch predictor. These models are usually targeted to CPU-bound applications which need more CPU performance counter parameters and display high CPU usage most of time. On a Web server environment, the applications are mostly I/O-bound which creates non-linear effects among server statistics of performance and power, making these models less suitable for Web servers. This paper presents a new approach for power models for Web servers, based on ranges of CPU usage values and performance server statistics. This new method softens non-linear relationship between server statistics and power consumption on linear power models improving their accuracy.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130415970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
On the memory system requirements of future scientific applications: Four case-studies 未来科学应用对记忆系统的要求:四个案例研究
2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114176
Milan Pavlović, Yoav Etsion, Alex Ramírez
{"title":"On the memory system requirements of future scientific applications: Four case-studies","authors":"Milan Pavlović, Yoav Etsion, Alex Ramírez","doi":"10.1109/IISWC.2011.6114176","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114176","url":null,"abstract":"In this paper, we observe and characterize the memory behaviour, and specifically memory footprint, memory bandwidth and cache effectiveness, of several well-known parallel scientific applications running on a large processor cluster. Based on the analysis of their instrumented execution, we project some performance requirements from future memory systems serving large-scale chip multiprocessors (CMPs). In addition, we estimate the impact of memory system performance on the amount of instruction stalls, as well as on the real computational performance, using the number of floating point operations per second the applications perform. Our projections show that the limitations of present memory technologies, either by means of capacity or bandwidth, will have a strong negative impact on scalability of memory systems for large CMPs. We conclude that future supercomputer systems require research on new alternative memory architectures, capable of offering both capacity and bandwidth beyond what current solutions provide.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125277987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Quantifying the common computational problems in contemporary applications 量化当代应用中常见的计算问题
2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114199
R. Jongerius, Phillip Stanley-Marbell, H. Corporaal
{"title":"Quantifying the common computational problems in contemporary applications","authors":"R. Jongerius, Phillip Stanley-Marbell, H. Corporaal","doi":"10.1109/IISWC.2011.6114199","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114199","url":null,"abstract":"Selecting, for each application, the top five functions for manual code inspection resulted in analyzing a portion of the source code accounting for 77% of the total run time. Figure 2 shows the fraction of the analyzed run time covered by the 16 identified CPs (some are condensed in one slice for clarity.)","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"458 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125848996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Decoupling datacenter studies from access to large-scale applications: A modeling approach for storage workloads 将数据中心研究与大规模应用程序的访问分离:存储工作负载的建模方法
2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114196
Christina Delimitrou, S. Sankar, Kushagra Vaid, C. Kozyrakis
{"title":"Decoupling datacenter studies from access to large-scale applications: A modeling approach for storage workloads","authors":"Christina Delimitrou, S. Sankar, Kushagra Vaid, C. Kozyrakis","doi":"10.1109/IISWC.2011.6114196","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114196","url":null,"abstract":"The cost and power impact of suboptimal storage configurations is significant in datacenters (DCs) as inefficiencies are aggregated over several thousand servers and represent considerable losses in capital and operating costs. Designing performance, power and cost-optimized systems requires a deep understanding of target workloads, and mechanisms to effectively model different storage design choices. Traditional benchmarking is invalid in cloud data-stores, representative storage profiles are hard to obtain, while replaying the entire application in all storage configurations is impractical both from a cost and time perspective. Despite these issues, current workload generators are not able to accurately reproduce key aspects of real application patterns. Some of these features include spatial and temporal locality, as well as tuning the intensity of the workload to emulate different storage system configurations. To address these limitations, we propose a modeling and characterization framework for large-scale storage applications. As part of this framework we use a state diagram-based storage model, extend it to a hierarchical representation and implement a tool that consistently recreates I/O loads of DC applications. We present the principal features of the framework that allow accurate modeling and generation of storage workloads and the validation process performed against ten original DC applications traces. Furthermore, using our framework, we perform an in-depth, per-thread characterization of these applications and provide insights on their behavior. Finally, we explore two practical applications of this methodology: SSD caching and defragmentation benefits on enterprise storage. In both cases we observe significant speedup for most of the examined applications. Since knowledge of the workload's spatial and temporal locality is necessary to model these use cases, our framework was instrumental in quantifying their performance benefits. The proposed methodology provides a detailed understanding on the storage activity of large-scale applications and enables a wide spectrum of storage studies without the requirement for access to real applications and full application deployment.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129993579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
A quantitative analysis of cooling power in container-based data centers 基于容器的数据中心冷却功率的定量分析
2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114197
Amer Qouneh, Chao Li, Tao Li
{"title":"A quantitative analysis of cooling power in container-based data centers","authors":"Amer Qouneh, Chao Li, Tao Li","doi":"10.1109/IISWC.2011.6114197","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114197","url":null,"abstract":"Cooling power is often represented as a single taxed cost on the total energy consumption of the data center. Some estimates go as far as 50% of the total energy demand. However, this view is rather simplistic in the presence of a multitude of cooling options and optimizations. In response to the rising cost of energy, the industry introduced modular design in the form of containers to serve as the new building block for data centers. However, it is still unclear how efficient they are compared to raised-floor data centers and under what conditions they are preferred. In this paper, we provide comparative and quantitative analysis of cooling power in both container-based and raised-floor data centers. Our results show that a container achieves 80% and 42% savings in cooling and facility powers respectively compared to a raised-floor data center and that savings of 41% in cooling power are possible when workloads are consolidated onto the least number of containers. We also show that cooling optimizations are not very effective at high utilizations; and that a raised-floor data center can approach the efficiency of a container at low utilizations when employing a simple cooling optimization.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127665189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信