16th Symposium on Computer Architecture and High Performance Computing最新文献

筛选
英文 中文
Value predictors for reuse through speculation on traces 通过推测跟踪来重用的价值预测器
16th Symposium on Computer Architecture and High Performance Computing Pub Date : 2004-10-27 DOI: 10.1109/CAHPC.2004.42
M. Pilla, P. Navaux, B. Childers, Amarildo T. da Costa, F. França
{"title":"Value predictors for reuse through speculation on traces","authors":"M. Pilla, P. Navaux, B. Childers, Amarildo T. da Costa, F. França","doi":"10.1109/CAHPC.2004.42","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.42","url":null,"abstract":"Reusing dynamic sequences of instructions - i.e., traces - improves performance for many benchmarks. However, many traces are not reused because of unavailable inputs in the reuse test. Reuse through speculation on traces (RST) aims to increase the number of reused traces by predicting those inputs when necessary, with minimal additional hardware when compared to nonspeculative trace reuse. In this paper, we compare last n-value and stride-aware prediction for trace inputs. Last n-value prediction uses the last recorded values as predictions, while stride-aware prediction identifies and uses strides to compute new predictions. Stride-aware RST has a higher hardware cost than last n-value RST and has also the shortcoming of not allowing branches inside predicted traces. This paper aims to determine which scheme is the most beneficial for RST. We show that stride values are important for reuse in RST and that last n-value prediction works as well as the more sophisticated stride-aware approach with simpler hardware.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123282205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The eDRAM based L3-cache of the BlueGene/L supercomputer processor node BlueGene/L超级计算机处理器节点基于eDRAM的L3-cache
16th Symposium on Computer Architecture and High Performance Computing Pub Date : 2004-10-27 DOI: 10.1109/CAHPC.2004.40
M. Ohmacht, D. Hoenicke, R. Haring, A. Gara
{"title":"The eDRAM based L3-cache of the BlueGene/L supercomputer processor node","authors":"M. Ohmacht, D. Hoenicke, R. Haring, A. Gara","doi":"10.1109/CAHPC.2004.40","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.40","url":null,"abstract":"BlueGene/L is a supercomputer consisting of 64K dual-processor system-on-a-chip compute nodes, capable of delivering an arithmetic peak performance of 5.6Gflops per node. To match the memory speed to the high compute performance, the system implements an aggressive three-level on-chip cache hierarchy for each node. The implemented hierarchy offers high bandwidth and integrated prefetching on cache hierarchy levels 2 and 3 to reduce memory access time. The integrated L3-cache stores a total of 4MB of data, using multibank embedded DRAM. The 1024 bit wide data port of the embedded DRAM provides 22.4GB/s bandwidth to serve the speculative prefetching demands of the two processor cores and the Gigabit Ethernet DMA engine.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122966733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
IATO: a flexible EPIC simulation environment IATO:一个灵活的EPIC模拟环境
16th Symposium on Computer Architecture and High Performance Computing Pub Date : 2004-10-27 DOI: 10.1109/CAHPC.2004.20
A. Darsch, André Seznec
{"title":"IATO: a flexible EPIC simulation environment","authors":"A. Darsch, André Seznec","doi":"10.1109/CAHPC.2004.20","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.20","url":null,"abstract":"High-performance superscalar processors are designed with the help of complex simulation environment. The simulation infrastructure permits to validate the processor instruction set and contributes as well to the performance evaluation of the selected microarchitecture. Unfortunately, new architectures like the EPIC are not properly supported in the research community. Due to its specificity, the EPIC architecture requires a new framework that gives the researcher an opportunity to explore the EPIC paradigm by characterizing the static and dynamic behavior of binary programs. In particular, this task is made difficult by the fact that the EPIC architecture defines a fully predicated ISA. This paper presents a novel simulation infrastructure, called IATO that permits to analyze, emulate and simulate the EPIC microarchitecture by using the IA64 ISA as the reference architecture. The novelty of the environment is to provide an in-order and an out-of-order cycle accurate execution-driven simulators. In particular, the out-of-order simulator provides an innovative solution for the out-of-order execution of a fully predicated ISA.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134153949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Performance evaluation of a prototype distributed NFS server 分布式NFS服务器原型的性能评估
16th Symposium on Computer Architecture and High Performance Computing Pub Date : 2004-10-27 DOI: 10.1109/CAHPC.2004.33
R. Ávila, P. Navaux, P. Lombard, A. Lèbre, Y. Denneulin
{"title":"Performance evaluation of a prototype distributed NFS server","authors":"R. Ávila, P. Navaux, P. Lombard, A. Lèbre, Y. Denneulin","doi":"10.1109/CAHPC.2004.33","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.33","url":null,"abstract":"A high-performance file system is normally a key point for large cluster installations, where hundreds or even thousands of nodes frequently need to manage large volumes of data. While most solutions usually make use of dedicated hardware and/or specific distribution and replication protocols, the NFSP (NFS Parallel) project aims at improving performance within a standard NFS client/server system. In this paper we investigate the possibilities of a replication model for the NFS server, which is based on Lasy Release Consistency (LRC). A prototype has been built upon the user-level NFSv2 server and a performance evaluation is carried out.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"40 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131249846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Cache filtering techniques to reduce the negative impact of useless speculative memory references on processor performance 缓存过滤技术,以减少无用的推测内存引用对处理器性能的负面影响
16th Symposium on Computer Architecture and High Performance Computing Pub Date : 2004-10-27 DOI: 10.1109/CAHPC.2004.11
O. Mutlu, Hyesoon Kim, D. N. Armstrong, Y. Patt
{"title":"Cache filtering techniques to reduce the negative impact of useless speculative memory references on processor performance","authors":"O. Mutlu, Hyesoon Kim, D. N. Armstrong, Y. Patt","doi":"10.1109/CAHPC.2004.11","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.11","url":null,"abstract":"High-performance processors employ aggressive speculation and prefetching techniques to increase performance. Speculative memory references caused by these techniques sometimes bring data into the caches that are not needed by correct execution. This paper proposes the use of the first-level caches as filters that predict the usefulness of speculative memory references. With the proposed technique, speculative memory references bring data only into the first-level caches rather than all levels in the cache hierarchy. The processor monitors the use of the cache blocks in the first-level caches and decides which blocks to keep in the cache hierarchy based on the usefulness of cache blocks. It is shown that a simple implementation of this technique usually outperforms inclusive and exclusive baseline cache hierarchies commonly used by today's processors and results in IPC performance improvements of up to 9.2% on the SPEC2000 integer benchmarks.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116026716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
ArchC: a systemC-based architecture description language 一种基于系统c的体系结构描述语言
16th Symposium on Computer Architecture and High Performance Computing Pub Date : 2004-10-27 DOI: 10.1109/CAHPC.2004.8
S. Rigo, G. Araújo, Marcus Bartholomeu, R. Azevedo
{"title":"ArchC: a systemC-based architecture description language","authors":"S. Rigo, G. Araújo, Marcus Bartholomeu, R. Azevedo","doi":"10.1109/CAHPC.2004.8","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.8","url":null,"abstract":"This paper presents an architecture description language (ADL) called ArchC, which is an open-source SystemC-based language that is specialized for processor architecture description. Its main goal is to provide enough information, at the right level of abstraction, in order to allow users to explore and verify new architectures, by automatically generating software tools like simulators and co-verification interfaces. ArchC's key features are a storage-based co-verification mechanism that automatically checks the consistency of a refined ArchC model against a reference (functional) description, memory hierarchy modeling capability, the possibility of integration with other SystemC IPs and the automatic generation of high-level SystemC simulators. We have used ArchC to synthesize both functional and cycle-based simulators for the MIPS, Intel 8051 and SPARC V8 processors, as well as functional models of modern architectures like TMS320C62x, XScale and PowerPC.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116100815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 110
Improving parallel execution time of sorting on heterogeneous clusters 提高异构集群上排序的并行执行时间
16th Symposium on Computer Architecture and High Performance Computing Pub Date : 2004-10-27 DOI: 10.1109/CAHPC.2004.21
C. Cérin, Michel Koskas, Hazem Fkaier, M. Jemni
{"title":"Improving parallel execution time of sorting on heterogeneous clusters","authors":"C. Cérin, Michel Koskas, Hazem Fkaier, M. Jemni","doi":"10.1109/CAHPC.2004.21","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.21","url":null,"abstract":"The aim of the paper is to introduce techniques in order to optimize the parallel execution time of sorting on heterogeneous platforms (processors speeds are related by a constant factor). We develop a constant time technique for mastering processor load balancing and execution time in an heterogeneous environment. We develop an analytical model for the parallel execution time, sustained by preliminary experimental results in the case of a 2-processors systems. The computation of the solution is independent of the problem size. Consequently, there is no overhead regarding the sorting problem.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125132603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A new migration model based on the evaluation of processes load and lifetime on heterogeneous computing environments 基于异构计算环境下进程负载和生命周期评估的迁移模型
16th Symposium on Computer Architecture and High Performance Computing Pub Date : 2004-10-27 DOI: 10.1109/CAHPC.2004.2
R. Mello, Luciano José Senger
{"title":"A new migration model based on the evaluation of processes load and lifetime on heterogeneous computing environments","authors":"R. Mello, Luciano José Senger","doi":"10.1109/CAHPC.2004.2","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.2","url":null,"abstract":"This paper presents a new model for evaluation of the positive and negative impacts related to the process migration on environments composed by heterogeneous capacity computers. On this model, a busy computer analyzes the occupation of each process and selects the more adequate for migration. The analysis and selection are done through a migration factor. This factor reflects how much the busy computer will be freed and how much the destination computer will be overloaded, in view of the migration of each process. The migrated processes are the ones that present migration factors to enhance the environment load balancing. The results from the carried out experiments have proved this model contributions when compared to related work. The contribution is the decrease in process average response time, which means higher performance.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128137091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Revisiting a BSP/CGM transitive closure algorithm 重新讨论BSP/CGM传递闭包算法
16th Symposium on Computer Architecture and High Performance Computing Pub Date : 2004-10-27 DOI: 10.1109/CAHPC.2004.36
E. Cáceres, Cristiano C. A. Vieira
{"title":"Revisiting a BSP/CGM transitive closure algorithm","authors":"E. Cáceres, Cristiano C. A. Vieira","doi":"10.1109/CAHPC.2004.36","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.36","url":null,"abstract":"We present a new BSP/CGM parallel algorithm for the transitive closure problem. Our algorithm uses O(n/sup 3//p/spl alpha/) local computation time with O(p//spl alpha/) communication rounds, where /spl alpha/ is the size in bits that can be stored in a primitive data item. For all the randomly generated graphs that were used in the tests, the number of communication rounds was bounded by log p/spl bsol//spl alpha/+1. Our algorithm, even for the worst case, improves the previous results. The algorithm was implemented and the results show the efficiency and scalability of the presented algorithm and compare favorably with other parallel implementations.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"516 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133132622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A performance evaluation of ARM ISA extension for elliptic curve cryptography over binary finite fields 二元有限域上椭圆曲线密码的ARM ISA扩展的性能评价
16th Symposium on Computer Architecture and High Performance Computing Pub Date : 2004-10-27 DOI: 10.1109/CAHPC.2004.5
S. Bartolini, I. Branovic, R. Giorgi, E. Martinelli
{"title":"A performance evaluation of ARM ISA extension for elliptic curve cryptography over binary finite fields","authors":"S. Bartolini, I. Branovic, R. Giorgi, E. Martinelli","doi":"10.1109/CAHPC.2004.5","DOIUrl":"https://doi.org/10.1109/CAHPC.2004.5","url":null,"abstract":"In this paper, we present an evaluation of possible ARM instruction set extension for elliptic curve cryptography (ECC) over binary finite fields GF(2/sup m/). The use of elliptic curve cryptography is becoming common in embedded domain, where its reduced key size at a security level equivalent to standard public-key methods (such as RSA) allows for power consumption savings and more efficient operation. ARM processor was selected because it is widely used for embedded system applications. We developed an ECC benchmark set with three widely used public-key algorithms: Diffie-Hellman for key exchange, digital signature algorithm, as well as El-Gamal method for encryption/decryption. We analyzed the major bottlenecks at function level and evaluated the performance improvement, when we introduce some simple architectural support in the ARM ISA. Results of our experiments show that the use of a word-level multiplication instruction over binary field allows for an average 33% reduction of the total number of dynamically executed instructions, while execution time improves by the same amount when projective coordinates are used.","PeriodicalId":375288,"journal":{"name":"16th Symposium on Computer Architecture and High Performance Computing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128169160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信