IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.最新文献

筛选
英文 中文
A Trace-Driven Simulator For Palm OS Devices 跟踪驱动模拟器为Palm OS设备
Hyrum D. Carroll, J. Flanagan, Satish Baniya
{"title":"A Trace-Driven Simulator For Palm OS Devices","authors":"Hyrum D. Carroll, J. Flanagan, Satish Baniya","doi":"10.1109/ISPASS.2005.1430570","DOIUrl":"https://doi.org/10.1109/ISPASS.2005.1430570","url":null,"abstract":"Due to the high cost of producing hardware prototypes, software simulators are typically used to determine the performance of proposed systems. To accurately represent a system with a simulator, the simulator inputs need to be representative of actual system usage. Trace-driven simulators that use logs of actual usage are generally preferred by researchers and developers to other types of simulators to determine expected performance. In this paper we explain the design and results of a trace-driven simulator for Palm OS devices capable of starting in a specified state and replaying a log of inputs originally generated on a handheld. We collect the user inputs with an acceptable amount of overhead while a device is executing real applications in normal operating environments. We based our simulator on the deterministic state machine model. The model specifies that two equivalent systems that start in the same state and have the same inputs applied, follow the same execution paths. By replaying the collected inputs we are able to collect traces and performance statistics from the simulator that are representative of actual usage with minimal perturbation. Our simulator can be used to evaluate various hardware modifications to Palm OS devices such as adding a cache. At the end of this paper we present an in-depth case study analyzing the expected memory performance from adding a cache to a Palm m515 device","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133334652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the Scalability of 1- and 2-Dimensional SIMD Extensions for Multimedia Applications 多媒体应用中一维和二维SIMD扩展的可扩展性
Friman Sánchez, M. Alvarez, E. Salamí, Alex Ramírez, M. Valero
{"title":"On the Scalability of 1- and 2-Dimensional SIMD Extensions for Multimedia Applications","authors":"Friman Sánchez, M. Alvarez, E. Salamí, Alex Ramírez, M. Valero","doi":"10.1109/ISPASS.2005.1430571","DOIUrl":"https://doi.org/10.1109/ISPASS.2005.1430571","url":null,"abstract":"SIMD extensions are the most common technique used in current processors for multimedia computing. In order to obtain more performance for emerging applications SIMD extensions need to be scaled. In this paper we perform a scalability analysis of SIMD extensions for multimedia applications. Scaling a 1-dimensional extension, like Intel MMX, was compared to scaling a 2-dimensional (matrix) extension. Evaluations have demonstrated that the 2-d architecture is able to use more parallel hardware than the 1-d extension. Speed-ups over a 2-way superscalar processor with MMX-like extension go up to 4X for kernels and up to 3.3X for complete applications and the matrix architecture can deliver, in some cases, more performance with simpler processor configurations. The experiments also show that the scaled matrix architecture is reaching the limits of the DLP available in the internal loops of common multimedia kernels","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"1 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125140160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Power-Performance Implications of Thread-level Parallelism on Chip Multiprocessors 芯片多处理器上线程级并行性的功率性能影响
Jian Li, José F. Martínez
{"title":"Power-Performance Implications of Thread-level Parallelism on Chip Multiprocessors","authors":"Jian Li, José F. Martínez","doi":"10.1109/ISPASS.2005.1430567","DOIUrl":"https://doi.org/10.1109/ISPASS.2005.1430567","url":null,"abstract":"We discuss power-performance implications of running parallel applications on chip multiprocessors (CMPs). First, we develop an analytical model that, for the first time, puts together parallel efficiency, granularity, and voltage/frequency scaling, to quantify the performance and power consumption, delivered by a CMP running a parallel code. Then, we conduct detailed simulations of parallel applications running on a power-performance CMP model. Our experiments confirm that our analytical model predicts power-performance behavior reasonably well. Both analytical and experimental models show that parallel computing can bring significant power savings and still meet a given performance target, by choosing granularity and voltage/frequency levels judiciously. The particular choice, however, is dependent on the application's parallel efficiency curve and the process technology utilized, which our model captures. Likewise, analytical model and experiments show the effect of a limited power budget on the application's scalability curve. In particular, we show that a limited power budget can cause a rapid performance degradation beyond a number of cores, even in the case of applications with excellent scalability properties. On the other hand, our experiments show that power-thrifty memory-bound applications can actually enjoy better scalability than more \"nominally scalable\" applications (i.e., without regard to power) when a limited power budget is in place","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114425142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Performance Analysis of a New Packet Trace Compressor based on TCP Flow Clustering 基于TCP流聚类的新型数据包跟踪压缩器性能分析
R. Holanda, Javier Verdú, J. García-Vidal, M. Valero
{"title":"Performance Analysis of a New Packet Trace Compressor based on TCP Flow Clustering","authors":"R. Holanda, Javier Verdú, J. García-Vidal, M. Valero","doi":"10.1109/ISPASS.2005.1430576","DOIUrl":"https://doi.org/10.1109/ISPASS.2005.1430576","url":null,"abstract":"In this paper we study the properties of a new packet trace compression method based on clustering of TCP flows. With our proposed method, the compression ratio that we achieve is around 3%, reducing the file size, for instance, from 100 MB to 3 MB. Although this specification defines a lossy compressed data format, it preserves important statistical properties present into original trace. In order to validate the method, memory performance studies were done with the Radix Tree algorithm executing a trace generated by our method. To give support to these studies, measurements were taken of memory access and cache miss ratio. For the time, the results have showed that our proposed method provides a good solution for packet trace compression","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114721855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Scalarization on Short Vector Machines 短向量机的标量化
Yuan Zhao, K. Kennedy
{"title":"Scalarization on Short Vector Machines","authors":"Yuan Zhao, K. Kennedy","doi":"10.1109/ISPASS.2005.1430573","DOIUrl":"https://doi.org/10.1109/ISPASS.2005.1430573","url":null,"abstract":"Scalarization is a process that converts array statements into loop nests so that they can run on a scalar machine. One technical difficulty of scalarization is that temporary storage often needs to be allocated in order to preserve the semantics of array syntax - \"fetch before store\". Many techniques have been developed to reduce the size of temporary storage requirement in order to improve the memory hierarchy performance. With the emergence of short vector units on modern microprocessors, it is interesting to see how to extend the preexisting scalarization methods so that the underlying vector infrastructure is fully utilized, while at the same time keep the temporary storage minimized. In this paper, we extend a loop alignment algorithm for scalarization on short vector machines. The revised algorithm not only achieves vector execution with minimum temporary storage, but also handles data alignment properly, which is very important for performance. Our experiments on two types of widely available architectures demonstrate the effectiveness of our strategy","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117285275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Simulation Differences Between Academia and Industry: A Branch Prediction Case Study 学术界和工业界的模拟差异:一个分支预测案例研究
G. Loh
{"title":"Simulation Differences Between Academia and Industry: A Branch Prediction Case Study","authors":"G. Loh","doi":"10.1109/ISPASS.2005.1430556","DOIUrl":"https://doi.org/10.1109/ISPASS.2005.1430556","url":null,"abstract":"Computer architecture research in academia and industry is heavily reliant on simulation studies. While microprocessor companies have the resources to develop highly detailed simulation infrastructures that they correlate against their own silicon, academic researchers tend to use free, widely available simulators. The differences in instruction set architectures, operating systems, simulator models and benchmarks create disconnect between academic and industrial research studies. This paper presents a comparative study to find correlations and differences between the same microarchitecture studies conducted in two different frameworks. Due to the limited availability of industrial simulation frameworks, this research is limited to a case study of branch predictors. Encouragingly, our simulations indicate that several recently proposed branch predictors behave similarly in both environments when evaluated with the SPEC CPU benchmark suite. Unfortunately, we also present results that show that conclusions drawn from studies based on SPEC CPU do not necessarily hold when other applications are considered","PeriodicalId":230669,"journal":{"name":"IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126428448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信