Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)最新文献

筛选
英文 中文
SIMD optimization in COINS compiler infrastructure 硬币编译器基础结构中的SIMD优化
Mitsugu Suzuki, Nobuhisa Fujinami, Takeaki Fukuoka, Tan Watanabe, I. Nakata
{"title":"SIMD optimization in COINS compiler infrastructure","authors":"Mitsugu Suzuki, Nobuhisa Fujinami, Takeaki Fukuoka, Tan Watanabe, I. Nakata","doi":"10.1109/IWIA.2005.40","DOIUrl":"https://doi.org/10.1109/IWIA.2005.40","url":null,"abstract":"COINS is a compiler infrastructure that makes it easy to construct a new compiler by adding/modifying only part of the COINS of compiling/optimization features. SIMD optimization is a major advantage. We present an overview of COINS and some topics on its SIMD optimization.","PeriodicalId":103456,"journal":{"name":"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125206630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A multi-thread processor architecture based on the continuation model 基于延续模型的多线程处理器体系结构
T. Matsuzaki, S. Amamiya, M. Izumi, M. Amamiya
{"title":"A multi-thread processor architecture based on the continuation model","authors":"T. Matsuzaki, S. Amamiya, M. Izumi, M. Amamiya","doi":"10.1109/IWIA.2005.22","DOIUrl":"https://doi.org/10.1109/IWIA.2005.22","url":null,"abstract":"We are developing the Fuce processor based on the dataflow computing model. Fuce means fusion of communication and execution. In order to execute many threads with multiple thread execution units efficiently, the Fuce processor executes multiple threads using the exclusive multi-thread execution model. The core concept of the exclusive multi-thread execution model is continuation based multi-thread execution, which is derived from dataflow computing. The Fuce processor aims to fuse the intra-processor execution and inter-processor communication. The Fuce processor unifies processing inside the processor and communication with processors outside as events, and executes the event as a thread. In this paper, we introduce the architecture of the Fuce processor and evaluate the concurrency performance of a Fuce processor which we described in VHDL. As a result, we understood that the processor has concurrency capability when there is sufficient thread level parallelism.","PeriodicalId":103456,"journal":{"name":"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125933408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Preliminary evaluations of a FPGA-based-prototype of DIMMnet-2 network interface 基于fpga的DIMMnet-2网络接口原型的初步评估
N. Tanabe, A. Kitamura, T. Miyashiro, Y. Miyabe, T. Izawa, Y. Hamada, H. Nakajo, H. Amano
{"title":"Preliminary evaluations of a FPGA-based-prototype of DIMMnet-2 network interface","authors":"N. Tanabe, A. Kitamura, T. Miyashiro, Y. Miyabe, T. Izawa, Y. Hamada, H. Nakajo, H. Amano","doi":"10.1109/IWIA.2005.38","DOIUrl":"https://doi.org/10.1109/IWIA.2005.38","url":null,"abstract":"Performance improvement of interconnection networks for a PC cluster brings a bottleneck in a standard I/O bus such as PCI bus. DIMMnet is a network interface plugged into a memory slot instead of standard I/O buses. This strategy is one of the solutions in order to balance growing performance with future micro processors. DIMMnet-2 is a prototype which can be plugged into a DDR-DIMM slot to confirm its functions. In this paper, outline of FPGA-based DIMMnet-2 prototype and improvements from DIMMnet-1 to DIMMnet-2 are mentioned. Although the DIMMnet-2 uses an FPGA instead of an ASIC, the latency for writing 8 bytes into remote memory is only 0.948 /spl mu/s. It is about 3 times fewer latency than that of a high performance commercial network interface QsNET II plugged into PCI-X bus on Intel-based IA32 PC. The delay of CoreLogic part for BOTF sending of FPGA based DIMMnet-2 is 5.75 times as fast as that of DIMMnet-1.","PeriodicalId":103456,"journal":{"name":"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114903462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
PRESTOR-1: a processor extending multithreaded architecture PRESTOR-1:一个扩展多线程架构的处理器
K. Tanaka
{"title":"PRESTOR-1: a processor extending multithreaded architecture","authors":"K. Tanaka","doi":"10.1109/IWIA.2005.39","DOIUrl":"https://doi.org/10.1109/IWIA.2005.39","url":null,"abstract":"Multithreaded processors are globally spreading. Multithreaded architecture enables fast context switching for tolerating memory access latency and bridging synchronization gap, and thus enables efficient utilization of execution pipelines. However, it cannot avoid all pipeline stalls; stalls still occur when all processor built-in threads are in a wait state or there are not enough threads in a task/process to fill up all available context slots, since the mechanism for switching active threads is effective only for processor built-in threads' contexts. We developed a new multithreaded processor, PRESTOR-1, that increases the virtual number of built-in threads' contexts and enables seamless task/thread switching by allocating and swapping task/thread contexts hierarchically between processor and memory in a multitasking environment. The processor supports real-time applications through hierarchical task/thread allocation based on the task/thread priority and fast response mechanisms for interrupt requests exploiting the multiple-context architecture. Moreover, the processor has reconfigurable caches that provide a priority-based partitioning cache and a FIFO buffer. In this paper, we describe the details of PRESTOR-1.","PeriodicalId":103456,"journal":{"name":"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134560599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Superscalar processor with multi-bank register file 具有多银行寄存器文件的超标量处理器
T. Hironaka, M. Maeda, K. Tanigawa, T. Sueyoshi, K. Aoyama, T. Koide, H. Mattausch, T. Saito
{"title":"Superscalar processor with multi-bank register file","authors":"T. Hironaka, M. Maeda, K. Tanigawa, T. Sueyoshi, K. Aoyama, T. Koide, H. Mattausch, T. Saito","doi":"10.1109/IWIA.2005.42","DOIUrl":"https://doi.org/10.1109/IWIA.2005.42","url":null,"abstract":"Register files in highly parallel superscalar processors tend to have large chip area and many access ports. This trend causes problems with chip-size, access time and power consumption. As one of the methods for solving these problems, we have proposed a multi-bank register file which realizes small area, high speed and low power consumption. We have proved effectiveness of this method by software simulation, and by detail designing it as synthesizable Verilog-HDL description with a full custom designed multi-bank register file. In this paper, we show the detail architecture of a superscalar processor with the multi-bank register file and its evaluation results.","PeriodicalId":103456,"journal":{"name":"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129231984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A New Kind of Processor Interface for a System-on-Chip Processor with TIE Ports and TIE Queues of Xtensa LX Xtensa LX具有TIE端口和TIE队列的片上系统处理器的一种新型处理器接口
T. Tohara
{"title":"A New Kind of Processor Interface for a System-on-Chip Processor with TIE Ports and TIE Queues of Xtensa LX","authors":"T. Tohara","doi":"10.1109/IWIA.2005.23","DOIUrl":"https://doi.org/10.1109/IWIA.2005.23","url":null,"abstract":"Today, most System-on-a-Chip (SoC) ASIC chips integrate multiple processor cores as well as hard-wired RTL blocks to realize very complex applications. While computation performance of processors increases, data throughput becomes the bottleneck. Moreover, as processors and RTL blocks need to share data and control/status, inter processors/RTL communications become a serious issue. While various system interconnects have been introduced, processor interface architecture remains conceptually the same. To overcome the communication bottleneck, this paper presents a new type of embedded processor interface for SoC design. And, as the actual realization of such an interface, the TIE ports and TIE queues of XtensaLX processor from Tensilica, Inc. is introduced in this paper.","PeriodicalId":103456,"journal":{"name":"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121505684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Performance evaluation of dynamic network reconfiguration using Detour-UD routing 基于Detour-UD路由的动态网络重构性能评价
T. Yoshinaga, Y. Nishimura
{"title":"Performance evaluation of dynamic network reconfiguration using Detour-UD routing","authors":"T. Yoshinaga, Y. Nishimura","doi":"10.1109/IWIA.2005.37","DOIUrl":"https://doi.org/10.1109/IWIA.2005.37","url":null,"abstract":"Fault-tolerance is an emerging issue for massively parallel computers. This paper describes the performance impact of dynamic network reconfiguration protocols using a fault-tolerant, adaptive deadlock-recovery routing algorithm, Detour-UD, for k-ary n-cubes. We propose a scheme to specify unroutable packets by managing drain-flags in routing tables. We also propose two selective drainage protocols. One protocol drains the unroutable packets specified by the drain-flags after the reconfiguration process. The other protocol drains deadlocked packets to reduce the network load during the reconfiguration process. Our simulation results show that the first protocol helps reduce the number of drainage packets, and the second one keeps the network throughput during the reconfiguration process.","PeriodicalId":103456,"journal":{"name":"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122063769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Continuum computer architecture for nano-scale and ultra-high clock rate technologies 用于纳米级和超高时钟速率技术的连续体计算机体系结构
T. Sterling, M. Brodowicz
{"title":"Continuum computer architecture for nano-scale and ultra-high clock rate technologies","authors":"T. Sterling, M. Brodowicz","doi":"10.1109/IWIA.2005.27","DOIUrl":"https://doi.org/10.1109/IWIA.2005.27","url":null,"abstract":"Continuum computer architecture (CCA) is a non-von Neumann architecture that offers an alternative to conventional structures as digital technology evolves towards nano-scale and the ultimate flat-lining of Moore's law. Coincidentally, it also defines a model of architecture particularly well suited to logic classes that exhibit ultra-high clock rates (> 100 GHz) such as rapid single flux quantum (RSFQ) gates. CCA eliminates the concept of the \"CPU\" that has dominated computer architecture since its inception more than half a century ago and establishes a new local element that merges the properties of state storage, state transfer, and state operation. A CCA system architecture is a simple multidimensional organization of these elemental blocks and physically may be considered as a new family of cellular computer. But CCA differs dramatically from conventional cellular automata. While both deliver emergent global behavior from the aggregation of local rules and ensuing operation. The CCA emergent behavior is a global general-purpose model of parallel computation, as opposed to simply mimicking some limited phenomenon like heat and mass transfer as do conventional cellular automata. This paper presents the motivation and foundation concepts of CCA and exposes key issues for further work.","PeriodicalId":103456,"journal":{"name":"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)","volume":"99 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133817890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Performance comparison of vector-calculations between Itanium2 and other processors Itanium2与其他处理器矢量计算性能比较
T. Nanri, Y. Watanabe, H. Sato
{"title":"Performance comparison of vector-calculations between Itanium2 and other processors","authors":"T. Nanri, Y. Watanabe, H. Sato","doi":"10.1109/IWIA.2005.36","DOIUrl":"https://doi.org/10.1109/IWIA.2005.36","url":null,"abstract":"This paper examines the performance similarity of the Intel Itanium2 processor and a vector processor. From the measurements of vector-calculations on latest scalar processors, Itanium2 shares similar strong points and weak points of performance with VPP5000. For multiplications of dense matrices, Itanium2 and VPP5000 show relatively high sustained-performance to the theoretical peak. For matrix-vector multiplications with sparse matrices, on the other hand, those two processors show poor performance.","PeriodicalId":103456,"journal":{"name":"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)","volume":"228 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122151871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An exploration of the technology space for multi-core memory/logic chips for highly scalable parallel systems 探索用于高度可扩展并行系统的多核存储器/逻辑芯片的技术空间
P. Kogge
{"title":"An exploration of the technology space for multi-core memory/logic chips for highly scalable parallel systems","authors":"P. Kogge","doi":"10.1109/IWIA.2005.24","DOIUrl":"https://doi.org/10.1109/IWIA.2005.24","url":null,"abstract":"Chip-level multi-processing, where more than one CPU \"core\" share the same die with significant parts of the memory hierarchy, is appearing with increasing frequency as standard design practice. This paper takes a broader look at how such mixed logic/memory dies may evolve in the future by walking through the latest CMOS roadmap projections, and casting them in terms of the key chip-level system level building blocks. Given the increasing importance of memory density in such systems, especially as we move to single chip-type designs, we pay particular attention to the potential use of not SRAM but leading edge DRAM for many memory structures. The roles of other factors, such as interconnect and power, is also considered.","PeriodicalId":103456,"journal":{"name":"Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'05)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114280603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信