2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)最新文献

A detailed and flexible cycle-accurate Network-on-Chip simulator 一个详细和灵活的周期精确的片上网络模拟器

2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2013-04-21 DOI: 10.1109/ISPASS.2013.6557149

Nan Jiang, Daniel U. Becker, George Michelogiannakis, J. Balfour, Brian Towles, D. E. Shaw, John Kim, W. Dally

引用次数: 639

Understanding the implications of virtual machine management on processor microarchitecture design 了解虚拟机管理对处理器微体系结构设计的影响

2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2013-04-21 DOI: 10.1109/ISPASS.2013.6557145

Xiufeng Sui, Tao Sun, Tao Li, Lixin Zhang

{"title":"Understanding the implications of virtual machine management on processor microarchitecture design","authors":"Xiufeng Sui, Tao Sun, Tao Li, Lixin Zhang","doi":"10.1109/ISPASS.2013.6557145","DOIUrl":"https://doi.org/10.1109/ISPASS.2013.6557145","url":null,"abstract":"Cloud computing has demonstrated tremendous capability in a wide spectrum of online services. Virtualization provides an efficient solution to the utilization of modern multicore processor systems while affording significant flexibility. The growing popularity of virtualized datacenters motivates deeper understanding of the interactions between virtual machine management and the micro-architecture behaviors of the privileged domain. We argue that these behaviors must be factored into the design of processor microarchitecture in virtualized datacenters. In this work, we use performance counters on modern servers to study the micro-architectural execution characteristics of the privileged domain while performing various VM management operations. Our study shows that today's state-of-the-art processor still has room for further optimizations when executing virtualized cloud workloads, particularly in the organization of last level caches and on-chip cache coherence protocol. Specifically, our analysis shows that: shared caches could be partitioned to eliminate interference between the privileged domain and guest domains; the cache coherence protocol could support a high degree of data sharing of the privileged domain; and cache capacity or CPU utilization occupied by the privileged domain could be effectively managed when performing management workflows to achieve high system throughput.","PeriodicalId":299172,"journal":{"name":"2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116491918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Wall-clock based synchronization: A parallel simulation technology for cluster systems 基于挂钟的同步:集群系统的并行仿真技术

2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2013-04-21 DOI: 10.1109/ISPASS.2013.6557166

Xiaodong Zhu, Junmin Wu, Guoliang Chen, Tao Li

引用次数: 4

Use of simple analytic performance models for streaming data applications deployed on diverse architectures 为部署在不同架构上的流数据应用程序使用简单的分析性能模型

2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2013-04-21 DOI: 10.1109/ISPASS.2013.6557162

J. Beard, R. Chamberlain

引用次数: 4

Power measurement techniques on standard compute nodes: A quantitative comparison 标准计算节点上的功率测量技术:定量比较

2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2013-04-21 DOI: 10.1109/ISPASS.2013.6557170

D. Hackenberg, T. Ilsche, R. Schöne, Daniel Molka, Maik Schmidt, W. Nagel

引用次数: 132

Parallel GPU architecture simulation framework exploiting work allocation unit parallelism 利用工作分配单元并行性的并行GPU架构仿真框架

2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2013-04-21 DOI: 10.1109/ISPASS.2013.6557151

Sangpil Lee, W. Ro

{"title":"Parallel GPU architecture simulation framework exploiting work allocation unit parallelism","authors":"Sangpil Lee, W. Ro","doi":"10.1109/ISPASS.2013.6557151","DOIUrl":"https://doi.org/10.1109/ISPASS.2013.6557151","url":null,"abstract":"GPU computing is at the forefront of high-performance computing, and it has greatly affected current studies on parallel software and hardware design because of its massively parallel architecture. Therefore, numerous studies have focused on the utilization of GPUs in various fields. However, studies of GPU architectures are constrained by the lack of a suitable GPU simulator. Previously proposed GPU simulators do not have sufficient simulation speed for advanced software and architecture studies. In this paper, we propose a new parallel simulation framework and a parallel simulation technique called work-group parallel simulation in order to improve the simulation speed for modern many-core GPUs. The proposed framework divides the GPU architecture into parallel and shared components, and it determines which GPU component can be effectively parallelized and can work correctly in multithreaded simulation. In addition, the work-group parallel simulation technique effectively boosts the performance of parallelized GPU simulation by eliminating the synchronization overhead. Experimental results obtained using a simulator with the proposed framework show that the proposed parallel simulation technique has a speed-up of up to 4.15 as compared to an existing sequential GPU simulator on an 8-core machine providing minimized cycle errors.","PeriodicalId":299172,"journal":{"name":"2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133657820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Sampled simulation of multi-threaded applications 多线程应用程序的采样模拟

2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2013-04-21 DOI: 10.1109/ISPASS.2013.6557141

Trevor E. Carlson, W. Heirman, L. Eeckhout

{"title":"Sampled simulation of multi-threaded applications","authors":"Trevor E. Carlson, W. Heirman, L. Eeckhout","doi":"10.1109/ISPASS.2013.6557141","DOIUrl":"https://doi.org/10.1109/ISPASS.2013.6557141","url":null,"abstract":"Sampling is a well-known workload reduction technique that allows one to speed up architectural simulation while accurately predicting performance. Previous sampling methods have been shown to accurately predict single-threaded application runtime based on its overall IPC. However, these previous approaches are unsuitable for general multi-threaded applications, for which IPC is not a good proxy for runtime. Additionally, we find that issues such as application periodicity and inter-thread synchronization play a significant role in determining how best to sample these applications. The proposed multi-threaded application sampling methodology is able to derive an effective sampling strategy for candidate applications using architecture-independent metrics. Using this methodology, large input sets can now be simulated which would otherwise be infeasible, allowing for more accurate conclusions to be made than from studies using scaled-down input sets. Through the use of the proposed methodology, we can simulate less than 10% of the total application runtime in detail. On the SPEComp, NPB and PARSEC benchmarks, running on an 8-core simulated system, we achieve an average absolute error of 3.5%.","PeriodicalId":299172,"journal":{"name":"2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117192984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

Advancing computer systems without technology progress 在没有技术进步的情况下推进计算机系统

2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2013-04-21 DOI: 10.1109/ISPASS.2013.6557164

C. Kozyrakis

{"title":"Advancing computer systems without technology progress","authors":"C. Kozyrakis","doi":"10.1109/ISPASS.2013.6557164","DOIUrl":"https://doi.org/10.1109/ISPASS.2013.6557164","url":null,"abstract":"Summary form only given. Computing is now an essential tool for all aspects of human endeavor, including healthcare, education, science, commerce, government, and entertainment. We expect our computers, whether those hidden away in data-centers or those in a handheld form factor, to be capable of running sophisticated algorithms that process rapidly growing volumes of data. In other words, we expect our computers to have exponentially increasing performance at constant cost (energy and chip area). For decades, CMOS technology has been our ally, providing exponential improvements in both transistor density and energy consumption, which we turned into exponential improvements in system performance. Unfortunately, we are now in a phase where transistor cost and energy consumption are barely scaling, making it necessary to rethink the way we build scalable systems. In this talk, we will consider how to advance computer systems without technology progress. There are several promising directions that combined can provide improvements equivalent to several decades of Moore's law. These directions include massive parallelism with locality awareness, specialization, removing the bloat from our infrastructure, increasing system utilization, and embracing approximate computing. We will review motivating results in these areas, establish that they require cross-layer optimizations across both hardware and software, and discuss the remaining challenges that systems researchers must address.","PeriodicalId":299172,"journal":{"name":"2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121464867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

An analytical framework for estimating TCO and exploring data center design space 用于估计TCO和探索数据中心设计空间的分析框架

2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2013-04-21 DOI: 10.1109/ISPASS.2013.6557146

D. Hardy, Marios Kleanthous, I. Sideris, A. Saidi, Emre Ozer, Yiannakis Sazeides

{"title":"An analytical framework for estimating TCO and exploring data center design space","authors":"D. Hardy, Marios Kleanthous, I. Sideris, A. Saidi, Emre Ozer, Yiannakis Sazeides","doi":"10.1109/ISPASS.2013.6557146","DOIUrl":"https://doi.org/10.1109/ISPASS.2013.6557146","url":null,"abstract":"In this paper, we present EETCO: an estimation and exploration tool that provides qualitative assessment of data center design decisions on Total-Cost-of-Ownership (TCO) and environmental impact. It can capture the implications of many parameters including server performance, power, cost, and Mean-Time-To-Failure (MTTF). The tool includes a model for spare estimation needed to account for server failures and performance variability. The paper describes the tool model and its implementation, and presents experiments that explore tradeoffs offered by different server configurations, performance variability, MTTF, 2D vs 3D-stacked processors, and ambient temperature. These experiments reveal, for the data center configurations used in this study, several opportunities for profit and optimization in the datacenter ecosystem: (i) servers with different computing performance and power consumption merit exploration to minimize TCO and the environmental impact, (ii) performance variability is desirable if it comes with a drastic cost reduction, (iii) shorter processor MTTF is beneficial if it comes with a moderate processor cost reduction, (iv) increasing by few degrees the ambient datacenter temperature reduces the environmental impact with a minor increase in the TCO and (v) a higher cost for a 3D-stacked processor with shorter MTTF and higher power consumption can be preferred, over a conventional 2D processor, if it offers a moderate performance increase.","PeriodicalId":299172,"journal":{"name":"2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131031783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Energy efficiency of lossless data compression on a mobile device: An experimental evaluation 移动设备上无损数据压缩的能量效率:一个实验评估

2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2013-04-21 DOI: 10.1109/ISPASS.2013.6557156

Armen Dzhagaryan, A. Milenković, Martin Burtscher

引用次数: 10