IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.最新文献_第2页

Motivation for Variable Length Intervals and Hierarchical Phase Behavior 变长度区间和层次相位行为的动机

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005. Pub Date : 2005-03-20 DOI: 10.1109/ISPASS.2005.1430568

Jeremy Lau, Erez Perelman, Greg Hamerly, T. Sherwood, B. Calder

引用次数: 91

A High Performance, Energy Efficient GALS ProcessorMicroarchitecture with Reduced Implementation Complexity 一种高性能、高能效的GALS处理器微架构，降低了实现复杂度

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005. Pub Date : 2005-03-20 DOI: 10.1109/ISPASS.2005.1430558

Yongkang Zhu, D. Albonesi, A. Buyuktosunoglu

{"title":"A High Performance, Energy Efficient GALS ProcessorMicroarchitecture with Reduced Implementation Complexity","authors":"Yongkang Zhu, D. Albonesi, A. Buyuktosunoglu","doi":"10.1109/ISPASS.2005.1430558","DOIUrl":"https://doi.org/10.1109/ISPASS.2005.1430558","url":null,"abstract":"As the costs and challenges of global clock distribution grow with each new microprocessor generation, a globally asynchronous, locally synchronous (GALS) approach becomes an attractive alternative. One proposed GALS approach, called a multiple clock domain (MCD) processor, achieves impressive energy savings for a relatively low performance cost. However, the approach requires separating the processor into four domains, including separating the integer and memory domains which complicates load scheduling, and the implementation of 32 voltage and frequency levels in each domain. In addition, the hardware-based control algorithm, though effective overall, produces a significant performance degradation for some applications. In this paper, we devise modifications to the MCD design that retain many of its benefits while greatly reducing the implementation complexity. We first determine that the synchronization channels that are most responsible for the MCD performance degradation are those involving cache access, and propose merging the integer and memory domains to virtually eliminate this overhead. We further propose significantly reducing the number of voltage levels, separating the reorder buffer into its own domain to permit front-end frequency scaling, separating the L2 cache to permit standard power optimizations to be used, and a new online algorithm that provides consistent results across our benchmark suite. The overall result is a significant reduction in the performance degradation of the original MCD approach and greater energy savings, with a greatly simplified microarchitecture that is much easier to implement","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115230965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites 测量程序相似性:用SPEC CPU基准套件进行实验

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005. Pub Date : 2005-03-20 DOI: 10.1109/ISPASS.2005.1430555

Aashish Phansalkar, A. Joshi, L. Eeckhout, L. John

{"title":"Measuring Program Similarity: Experiments with SPEC CPU Benchmark Suites","authors":"Aashish Phansalkar, A. Joshi, L. Eeckhout, L. John","doi":"10.1109/ISPASS.2005.1430555","DOIUrl":"https://doi.org/10.1109/ISPASS.2005.1430555","url":null,"abstract":"It is essential that a subset of benchmark programs used to evaluate an architectural enhancement, is well distributed within the target workload space rather than clustered in specific areas. Past efforts for identifying subsets have primarily relied on using microarchitecture-dependent metrics of program performance, such as cycles per instruction and cache miss-rate. The shortcoming of this technique is that the results could be biased by the idiosyncrasies of the chosen configurations. The objective of this paper is to present a methodology to measure similarity of programs based on their inherent microarchitecture-independent characteristics which will make the results applicable to any microarchitecture. We apply our methodology to the SPEC CPU2000 benchmark suite and demonstrate that a subset of 8 programs can be used to effectively represent the entire suite. We validate the usefulness of this subset by using it to estimate the average IPC and L1 data cache miss-rate of the entire suite. The average IPC of 8-way and 16-way issue superscalar processor configurations could be estimated with 3.9% and 4.4% error respectively. This methodology is applicable not only to find subsets from a benchmark suite, but also to identify programs for a benchmark suite from a list of potential candidates. Studying the four generations of SPEC CPU benchmark suites, we find that other than a dramatic increase in the dynamic instruction count and increasingly poor temporal data locality, the inherent program characteristics have more or less remained the same","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122693928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 172

Accelerating Multiprocessor Simulation with a Memory Timestamp Record 使用内存时间戳记录加速多处理器仿真

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005. Pub Date : 2005-03-20 DOI: 10.1109/ISPASS.2005.1430560

K. Barr, Heidi Pan, Michael Zhang, K. Asanović

引用次数: 50

Dataflow: A Complement to Superscalar 数据流:超标量的补充

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005. Pub Date : 2005-03-20 DOI: 10.1109/ISPASS.2005.1430572

M. Budiu, Pedro V. Artigas, S. Goldstein

引用次数: 44

Anatomy and Performance of SSL Processing SSL处理的剖析和性能

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005. Pub Date : 2005-03-20 DOI: 10.1109/ISPASS.2005.1430574

Li Zhao, R. Iyer, S. Makineni, L. Bhuyan

{"title":"Anatomy and Performance of SSL Processing","authors":"Li Zhao, R. Iyer, S. Makineni, L. Bhuyan","doi":"10.1109/ISPASS.2005.1430574","DOIUrl":"https://doi.org/10.1109/ISPASS.2005.1430574","url":null,"abstract":"A wide spectrum of e-commerce (B2B/B2C), banking, financial trading and other business applications require the exchange of data to be highly secure. The Secure Sockets Layer (SSL) protocol provides the essential ingredients of secure communications - privacy, integrity and authentication. Though it is well-understood that security always comes at the cost of performance, these costs depend on the cryptographic algorithms. In this paper, we present a detailed description of the anatomy of a secure session. We analyze the time spent on the various cryptographic operations (symmetric, asymmetric and hashing) during the session negotiation and data transfer. We then analyze the most frequently used cryptographic algorithms (RSA, AES, DES, 3DES, RC4, MD5 and SHA-1). We determine the key components of these algorithms (setting up key schedules, encryption rounds, substitutions, permutations, etc) and determine where most of the time is spent. We also provide an architectural analysis of these algorithms, show the frequently executed instructions and discuss the ISA/hardware support that may be beneficial to improving SSL performance. We believe that the performance data presented in this paper is useful to performance analysts and processor architects to help accelerate SSL performance in future processors","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115729967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 70

The Strong correlation Between Code Signatures and Performance 代码签名和性能之间的强相关性

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005. Pub Date : 2005-03-20 DOI: 10.1109/ISPASS.2005.1430578

Jeremy Lau, J. Sampson, Erez Perelman, Greg Hamerly, B. Calder

{"title":"The Strong correlation Between Code Signatures and Performance","authors":"Jeremy Lau, J. Sampson, Erez Perelman, Greg Hamerly, B. Calder","doi":"10.1109/ISPASS.2005.1430578","DOIUrl":"https://doi.org/10.1109/ISPASS.2005.1430578","url":null,"abstract":"A recent study examined the use of sampled hardware counters to create sampled code signatures. This approach is attractive because sampled code signatures can be quickly gathered for any application. The conclusion of their study was that there exists a fuzzy correlation between sampled code signatures and performance predictability. The paper raises the question of how much information is lost in the sampling process, and our paper focuses on examining this issue. We first focus on showing that there exists a strong correlation between code signatures and performance. We then examine the relationship between sampled and full code signatures, and how these affect performance predictability. Our results confirm that there is a fuzzy correlation found in recent work for the SPEC programs with sampled code signatures, but that a strong correlation exists with full code signatures. In addition, we propose converting the sampled instruction counts, used in the prior work, into sampled code signatures representing loop and procedure execution frequencies. These sampled loop and procedure code signatures allow phase analysis to more accurately and easily find patterns, and they correlate better with performance","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126342115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 100

BioBench: A Benchmark Suite of Bioinformatics Applications bibench:生物信息学应用的基准套件

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005. Pub Date : 2005-03-20 DOI: 10.1109/ISPASS.2005.1430554

K. Albayraktaroglu, A. Jaleel, Xue Wu, Manoj Franklin, Bruce Jacob, C. Tseng, Donald Yeung

{"title":"BioBench: A Benchmark Suite of Bioinformatics Applications","authors":"K. Albayraktaroglu, A. Jaleel, Xue Wu, Manoj Franklin, Bruce Jacob, C. Tseng, Donald Yeung","doi":"10.1109/ISPASS.2005.1430554","DOIUrl":"https://doi.org/10.1109/ISPASS.2005.1430554","url":null,"abstract":"Recent advances in bioinformatics and the significant increase in computational power available to researchers have made it possible to make better use of the vast amounts of genetic data that has been collected over the last two decades. As the uses of genetic data expand to include drug discovery and development of gene-based therapies, bioinformatics is destined to take its place in the forefront of scientific computing application domains. Despite the clear importance of this field, common bioinformatics applications and their implication on microarchitectural design have received scant attention from the computer architecture community so far. The availability of a common set of bioinformatics benchmarks could be the first step to motivate further research in this crucial area. To this end, this paper presents BioBench, a benchmark suite that represents a diverse set of bioinformatics applications. The first version of BioBench includes applications from different application domains, with a particular emphasis on mature genomics applications. The applications in the benchmark are described briefly, and basic execution characteristics obtained on a real processor are presented. Compared to SPEC INT and SPEC FP benchmarks, applications in BioBench display a higher percentage of load/store instructions, almost negligible floating-point operation content, and higher IPC than either SPEC INT or SPEC FP applications. Our evaluation suggests that bioinformatics applications have distinctly different characteristics from the applications in both of the mentioned SPEC suites; and our findings indicate that bioinformatics workloads can benefit from architectural improvements to memory bandwidth and techniques that exploit their high levels of ILP. The entire BioBench suite and accompanying reference data will be made freely available to researchers","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131367897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 150

Pro-active Page Replacement for Scientific Applications: A Characterization 科学应用的主动页面替换:表征

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005. Pub Date : 2005-03-20 DOI: 10.1109/ISPASS.2005.1430579

M. Vilayannur, A. Sivasubramaniam, M. Kandemir

{"title":"Pro-active Page Replacement for Scientific Applications: A Characterization","authors":"M. Vilayannur, A. Sivasubramaniam, M. Kandemir","doi":"10.1109/ISPASS.2005.1430579","DOIUrl":"https://doi.org/10.1109/ISPASS.2005.1430579","url":null,"abstract":"Paging policies implemented by today's operating systems cause scientific applications to exhibit poor performance, when the application's working set does not fit in main memory. This has been typically attributed to the sub-optimal performance of LRU-like virtual-memory replacement algorithms. On one end of the spectrum, researchers in the past have proposed fully automated compiler-based techniques that provide crucial information on future access patterns (reuse-distances, release hints etc) of an application that can be exploited by the operating system to make intelligent prefetching and replacement decisions. Static techniques like the aforementioned can be quite accurate, but require that the source code be available and analyzable. At the other end of the spectrum, researchers have also proposed pure system-level algorithmic innovations to improve the performance of LRU-like algorithms, some of which are only interesting from the theoretical sense and may not really be implementable. Instead, in this paper we explore the possibility of tracking application's runtime behavior in the operating system, and find that there are several useful characteristics in the virtual memory behavior that can be anticipated and used to pro-actively manage physical memory usage. Specifically, we show that LRU-like replacement algorithms hold onto pages long after they outlive their usefulness and propose a new replacement algorithm that exploits the predictability of the application's page-fault patterns to reduce the number of page-faults. Our results demonstrate that such techniques can reduce page-faults by as much as 78% over both LRU and EELRU that is considered to be one of the state-of-the-art algorithms towards addressing the performance shortcomings of LRU. Further, we also present an implementable replacement algorithm within the operating system, that performs considerably better than the Linux kernel's replacement algorithm","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130271902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Trace-Driven Simulator For Palm OS Devices 跟踪驱动模拟器为Palm OS设备

IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005. Pub Date : 2005-03-20 DOI: 10.1109/ISPASS.2005.1430570

Hyrum D. Carroll, J. Flanagan, Satish Baniya

{"title":"A Trace-Driven Simulator For Palm OS Devices","authors":"Hyrum D. Carroll, J. Flanagan, Satish Baniya","doi":"10.1109/ISPASS.2005.1430570","DOIUrl":"https://doi.org/10.1109/ISPASS.2005.1430570","url":null,"abstract":"Due to the high cost of producing hardware prototypes, software simulators are typically used to determine the performance of proposed systems. To accurately represent a system with a simulator, the simulator inputs need to be representative of actual system usage. Trace-driven simulators that use logs of actual usage are generally preferred by researchers and developers to other types of simulators to determine expected performance. In this paper we explain the design and results of a trace-driven simulator for Palm OS devices capable of starting in a specified state and replaying a log of inputs originally generated on a handheld. We collect the user inputs with an acceptable amount of overhead while a device is executing real applications in normal operating environments. We based our simulator on the deterministic state machine model. The model specifies that two equivalent systems that start in the same state and have the same inputs applied, follow the same execution paths. By replaying the collected inputs we are able to collect traces and performance statistics from the simulator that are representative of actual usage with minimal perturbation. Our simulator can be used to evaluate various hardware modifications to Palm OS devices such as adding a cache. At the end of this paper we present an in-depth case study analyzing the expected memory performance from adding a cache to a Palm m515 device","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133334652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5