2008 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems最新文献

筛选
英文 中文
Acceleration for MPI Derived Datatypes Using an Enhancer of Memory and Network 使用内存和网络增强器加速MPI派生数据类型
N. Tanabe, H. Nakajo
{"title":"Acceleration for MPI Derived Datatypes Using an Enhancer of Memory and Network","authors":"N. Tanabe, H. Nakajo","doi":"10.1007/978-3-540-87475-1_46","DOIUrl":"https://doi.org/10.1007/978-3-540-87475-1_46","url":null,"abstract":"","PeriodicalId":220234,"journal":{"name":"2008 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"21 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132193896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Introspection-Based Fault Tolerance for COTS-Based High-Capability Computation in Space 基于自省的空间高容量计算容错
M. James, A. Shapiro, P. Springer, H. Zima
{"title":"Introspection-Based Fault Tolerance for COTS-Based High-Capability Computation in Space","authors":"M. James, A. Shapiro, P. Springer, H. Zima","doi":"10.1109/IWIA.2008.11","DOIUrl":"https://doi.org/10.1109/IWIA.2008.11","url":null,"abstract":"Future missions of deep space exploration face the challenge of designing, building,and operating progressively more capable autonomous spacecraft and planetary rovers. Given the communication latencies and bandwidth limitations for such missions, the need for increased autonomy becomes mandatory, along with the requirement for enhanced on-board computational capabilities while in deep space or time-critical situations. This will result in dramatic changes in the way missions will be conducted and supported by on-board computing systems. Specifically, the traditional approach of relying exclusively on radiation-hardened hardware and modular redundancy will not be able to deliver the required computational power. As a consequence, such systems are expected to include high-capability low-power components based on emerging Commercial-Off-The-Shelf (COTS) multi-core technology. This paper describes the design of a generic framework for introspection that supports runtime monitoring and analysis of program execution as well as a feedback-oriented recovery from faults. One of the first applications of this framework will be to provide flexible software fault tolerance matched to the requirements and properties of applications by exploiting knowledge that is either contained in an application knowledge base, provided by users, or automatically derived from specifications. A prototype implementation is currently in progress at the Jet Propulsion Laboratory, California Institute of Technology, targeting a cluster of Cell Broadband Engines.","PeriodicalId":220234,"journal":{"name":"2008 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131605921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A PLD Architecture for High Performance Computing 一种用于高性能计算的PLD体系结构
Naoki Hirakawa, Masanori Yoshihara, K. Tanigawa, T. Hironaka, Masayuki Sato
{"title":"A PLD Architecture for High Performance Computing","authors":"Naoki Hirakawa, Masanori Yoshihara, K. Tanigawa, T. Hironaka, Masayuki Sato","doi":"10.1109/IWIA.2008.12","DOIUrl":"https://doi.org/10.1109/IWIA.2008.12","url":null,"abstract":"In recent years, Field Programmable Gate Arrays (FPGAs) have been used for High Performance Computing (HPC). Because there is a significantly difference between configuration speed of FPGA and execution speed of Central Processing Unit (CPU), the difference causes performance degradation. To resolve of this problem, we proposed MPLD as a new Programmable Logic Device (PLD) architecture with high speed reconfiguration. The merits of the MPLD in HPC are high speed configuration and easy partial configuration.This is achieved by the configuration method which is same as write memory access of conventional parallel memory. In this paper, we describe the problems of FPGA on using it in HPC, and present the MPLD architecture which solves the problems. Some evaluation results of the prototype MPLD chip which implemented by using five metal layers ROHM 0.18¿m CMOS technology are also presented. As results, memory capacity of the prototype MPLD was 49152bit, and the core area was 1767.54 × 1690.96¿m2 and the number of metal layers used for wiring was three. The achieved configuration time is about 6.6¿sec for whole prototype MPLD. The configuration speed of the prototype MPLD is about 11.7 times higher than AS configuration used for Altera FPGAs.","PeriodicalId":220234,"journal":{"name":"2008 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127912054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The Shape of Things to Come: Future Potential of "Heavy Node" Multi-Core HPC Architectures 未来趋势:重节点 "多核高性能计算架构的未来潜力
P. Kogge
{"title":"The Shape of Things to Come: Future Potential of \"Heavy Node\" Multi-Core HPC Architectures","authors":"P. Kogge","doi":"10.1109/IWIA.2008.13","DOIUrl":"https://doi.org/10.1109/IWIA.2008.13","url":null,"abstract":"The Top 500 list has been tracking supercomputers since the early 1990s. The bulk of those systems, especially recently, have been built from leading edge commodity microprocessors. This paper analyzes potential future characteristics of such systems in the light of the advent of power-constrained multi-core microprocessors. The resulting predictions indicate that such systems will not be able, by themselves, to keep up traditional trend lines.","PeriodicalId":220234,"journal":{"name":"2008 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131550149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Power and High-Performance Communication Mechanism for Dependable Embedded Systems 可靠嵌入式系统的低功耗高性能通信机制
T. Hanawa, T. Boku, Shin'ichi Miura, Takayuki Okamoto, M. Sato, K. Arimoto
{"title":"Low-Power and High-Performance Communication Mechanism for Dependable Embedded Systems","authors":"T. Hanawa, T. Boku, Shin'ichi Miura, Takayuki Okamoto, M. Sato, K. Arimoto","doi":"10.1109/IWIA.2008.8","DOIUrl":"https://doi.org/10.1109/IWIA.2008.8","url":null,"abstract":"Recently, a multi-core processor has been used to improve the performance and to reduce the power consumption. In order to acquire higher performance, multiprocessor connected with the network can enlarge the processing power. Dependability is also important for the embedded system to protect from a fault and failure. We develop a parallel platform for dependable embedded system, and investigate the low-power, reliable, and high-performance communication mechanism for such platform. In this study, we propose a communicator with communication links using PCI Express Gen2, and it denotes that maximum bandwidth is 2GB/s and several watts is required for power consumption. Moreover, this platform provides fault tolerance using redundancy.","PeriodicalId":220234,"journal":{"name":"2008 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"579 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133321842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Unified Programming Environment for Heterogeneous Distributed Parallel Systems 异构分布式并行系统的统一编程环境
S. Hirasawa, H. Honda
{"title":"Unified Programming Environment for Heterogeneous Distributed Parallel Systems","authors":"S. Hirasawa, H. Honda","doi":"10.1109/IWIA.2008.16","DOIUrl":"https://doi.org/10.1109/IWIA.2008.16","url":null,"abstract":"Parallel execution environment, such as the multi-core CPU, a cluster, and a grid, has spread increasingly. The change from a homogeneous core based CPU and a shared memory to the distributed memory and the heterogeneous core based CPU is making system architecture complicated. The programming interface and programming model which are different in each parallel execution environment are used. Since this serves as a burden for users, it has barred the spread of parallel execution environment. In this paper, the execution model which treats such system architecture systematically is explored. This performs the unified programming interface for heterogeneous distributed memory system architecture.","PeriodicalId":220234,"journal":{"name":"2008 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122994822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic Application of Last-Touch Instructions for Leakage Energy Reduction 自动应用最后触控指令减少泄漏能量
Kiyofumi Tanaka, Junji Yamano
{"title":"Automatic Application of Last-Touch Instructions for Leakage Energy Reduction","authors":"Kiyofumi Tanaka, Junji Yamano","doi":"10.1109/IWIA.2008.10","DOIUrl":"https://doi.org/10.1109/IWIA.2008.10","url":null,"abstract":"Recently, energy dissipation in microprocessors is getting larger, which leads to a serious problem in terms of allowable temperature and performance improvement for future microprocessors. Cache memory is effective in bridging a growing speed gap between a processor and relatively slow external main memory, and has increased in its size. However, energy dissipation in the cache memory will approach or exceed 50% of the increasing total energy dissipation in processors. An important point to note is that, in the near future, static (leakage) energy will dominate the total energy consumption in deep sub-micron processes. This paper describes a code-generation technique that utilizes special instructions for energy reduction. The instructions perform self-invalidation in cache memories, which leads to efficient energy reduction. We developed a source-to-source-code compiler that implements the automatic code-generation. The simulation results of the generated codes show that our techniques can reduce a substantial amount of leakage energy without large performance degradation.","PeriodicalId":220234,"journal":{"name":"2008 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127800304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effect of Reordering Internal Messages in MPI Broadcast According to the Load Imbalance 根据负载不平衡对MPI广播内部消息重排序的影响
T. Soga, T. Nanri, M. Kurokawa, K. Murakami
{"title":"Effect of Reordering Internal Messages in MPI Broadcast According to the Load Imbalance","authors":"T. Soga, T. Nanri, M. Kurokawa, K. Murakami","doi":"10.1109/IWIA.2008.14","DOIUrl":"https://doi.org/10.1109/IWIA.2008.14","url":null,"abstract":"To achieve higher scalability of parallel programs on large scale parallel computers, reducing the time spent for collective communications is one of the most important issue. In this paper, a dynamic optimization method to adjust the implementation of Broadcast operation, one of the most popular collective communications, is introduced.Though there have been many attempts to speed up this operation, they assume that each rank starts this operation at the same time. However, in real execution, the time can be different because of load-imbalance among ranks. This paper first claims that this difference can cause increase of the cost for this operation. Then, as a method to avoid this problem, an optimization method that adjusts the order of point-to-point messages in Broadcast operations is introduced. This method uses the wait time of each rank at the operation to determine the status of load-imbalance.From the results of experiments, it is shown that this optimization method can reduced the time for the operation.In addition to that, it is also shown that the effect of the optimization depends on the size of data to be broadcasted and the amount of load-imbalance.","PeriodicalId":220234,"journal":{"name":"2008 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116741459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Register File Reliability Analysis Through Cycle-Accurate Thermal Emulation 基于周期精确热仿真的寄存器文件可靠性分析
J. Ayala, Pablo G Del Valle, David Atienza Alonso
{"title":"Register File Reliability Analysis Through Cycle-Accurate Thermal Emulation","authors":"J. Ayala, Pablo G Del Valle, David Atienza Alonso","doi":"10.1109/IWIA.2008.7","DOIUrl":"https://doi.org/10.1109/IWIA.2008.7","url":null,"abstract":"Continuous transistor scaling due to improvements in CMOS devices and manufacturing technologies is increasing processor power densities and temperatures; thus, creating challenges when trying to maintain manufacturing yield rates and devices which will be reliable throughout their lifetime. New microarchitectures require new reliability-aware design methods that can face these challenges without significantly increasing cost and performance. In this paper we present a complete analysis of reliability for the register file architecture of the Leon 3 processor. The analysis conducted is supported by the use of an accurate HW/SW FPGA-based emulation platform that enables a complete design space exploration of thermal and reliability metrics during the execution of an extended set of benchmarks, in a very limited amount of time. The effect of various compiler optimizations and register assignments on the reliability of the register file is then analyzed. Our results quantify the respective effects of these different factors and enable us to design a reliability-aware register file assignment policy that consistently improves the Mean-Time-To-Failure figure (20% on average) for the various types of applications.","PeriodicalId":220234,"journal":{"name":"2008 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123703821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and Power Performance Evaluation of On-Chip Memory Processor with Arithmetic Accelerators 带算术加速器的片上存储器处理器的设计与功耗性能评价
C. Takahashi, M. Sato, D. Takahashi, T. Boku, A. Ukawa, Hiroshi Nakamura, Hidetaka Aoki, H. Sawamoto, N. Sukegawa
{"title":"Design and Power Performance Evaluation of On-Chip Memory Processor with Arithmetic Accelerators","authors":"C. Takahashi, M. Sato, D. Takahashi, T. Boku, A. Ukawa, Hiroshi Nakamura, Hidetaka Aoki, H. Sawamoto, N. Sukegawa","doi":"10.1109/IWIA.2008.9","DOIUrl":"https://doi.org/10.1109/IWIA.2008.9","url":null,"abstract":"In this paper, we design an on-chip memory processor with arithmetic accelerators, which are expected to improve power consumption. In addition, we evaluate the power performance of the processor. We propose implementing vector-type arithmetic accelerators and SIMD-type arithmetic accelerators in the on-chip memory processor. The evaluation results obtained using our simulator indicate that the performance of the 4FMAs SIMD-type accelerators is similar to that of the 4FMAs vector-type accelerators on DAXPY, Livermore kernel 1 and 3. However, the performance of the 4FMAs vector-type accelerator exceeds that of the 4FMAs SIMD-type accelerator with respect to matrix multiplication and QCD because of difference in element size of the registers. On Livermore kernel 7, the power performance of the 4FMAs SIMD-type accelerators exceeds that of the 4FMAs vector-type because of register reuse. However, the 16FMAs vector-type accelerators have an advantage in almost all simulations, excluding main memory bandwidth intensive benchmarks.","PeriodicalId":220234,"journal":{"name":"2008 International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131045180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信