{"title":"[2010] VIX: A Router Architecture for Priority-Aware Networks-on-Chip","authors":"Takuma Kogo, N. Yamasaki","doi":"10.1109/IWIA.2010.15","DOIUrl":"https://doi.org/10.1109/IWIA.2010.15","url":null,"abstract":"In future many-core chip multiprocessors (CMPs) and systems-on-chips (SoCs) architectures, networks-on-chip (NoC) will be one of the most critical components. In CMPs and SoCs, multiple applications will be executed concurrently and they interfere each other. Thus, packet conflicts will be caused in the NoC. Priority control is required in such environments, because each application has different bandwidth requirements and causes different traffic patterns of the packets. Unfortunately priority control degrades network performance and significantly increases the area of a priority-aware on-chip router.This paper proposes a router architecture for priority-aware NoCs in order to mitigate the performance and area overheads due to the priority control. We implement the proposed router architecture using a 90nm process technology. The synthesis result shows no critical path overhead and drastic reduction of the router area. The simulation result on a 8-ary 2-mesh network shows that the average latency of higher priority packets is reduced at the near saturation point.","PeriodicalId":339844,"journal":{"name":"2010 International Workshop on Innovative Architecture for Future Generation High Performance","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124966690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cisse Ahmadou Dit Adi, Ping Qiu, H. Irie, T. Miyoshi, T. Yoshinaga
{"title":"[2010] OREX - An Optical Ring with Electrical Crossbar Hybrid Photonic Network-on-Chip","authors":"Cisse Ahmadou Dit Adi, Ping Qiu, H. Irie, T. Miyoshi, T. Yoshinaga","doi":"10.1109/IWIA.2010.13","DOIUrl":"https://doi.org/10.1109/IWIA.2010.13","url":null,"abstract":"The role of network-on-chip (NoC) is becoming more important as the number of processing elements (PE) integration onto a single chip increases. Lowering power consumption while providing capability of high-performance communication is a challenging problem for the design of future NoCs. In this paper we propose OREX, which is a hybrid NoC consisting of an optical ring and an electrical crossbar central router. OREX takes advantage of both electrical and optical technology designs state-of-art to deliver a high data rate transfer NoC at an acceptable power consumption cost. Using a cycle accurate simulator, we evaluate the proposed hybrid NoC. Simulation experiment shows that OREX presents slightly better communication performance in terms of bandwidth and power consumption compare to a conventional hybrid photonic torus network.","PeriodicalId":339844,"journal":{"name":"2010 International Workshop on Innovative Architecture for Future Generation High Performance","volume":"226 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121569019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[2009] A Stage-Level Recovery Scheme in Scalable Pipeline Modules for High Dependability","authors":"Jun Yao, Hajime Shimada, Kazutoshi Kobayashi","doi":"10.1109/IWIA.2010.11","DOIUrl":"https://doi.org/10.1109/IWIA.2010.11","url":null,"abstract":"In the recent years, the increasing error rate has become one of the major impediments for the application of new process technologies in electronic devices like microprocessors. This thereby necessitates the research of fault toleration mechanisms from all device, micro-architecture and system levels to keep correct computation in future microprocessors, along the advances of process technologies.Space redundancy, as dual or triple modular redundancy (DMR or TMR), is widely used to tolerate errors with a negligible performance loss. In this paper, at the micro-architecture level, we propose a very fine-grained recovery scheme based on a DMR processor architecture to cover every transient error inside of the memory interface boundary. Our recovery method makes full use of the existing duplicated hardware in the DMR processor, which can avoid large hardware extension by not using checkpoint buffers in many fault-tolerable processors. The hardware-based recovery is achieved by dynamically triggering an instruction re-execution procedure in the next cycle after error detection, which indicates a near-zero performance impact to achieve an error-free execution.A TMR architecture is usually preferred as it provides a seamless error correction by a majority voting logic and therefore generates no recovery delay. With our fast recovery scheme at a low hardware cost, our result shows that even under a relatively high transient error rate, it is possible to only use a DMR architecture to detect/recover errors at a negligible performance cost. Our reliable processor is thus constructed to use a DMR execution with the fast recovery as its major working mode. It saves around 1/3 energy consumption from a traditional TMR architecture, while the transient error coverage is still maintained.","PeriodicalId":339844,"journal":{"name":"2010 International Workshop on Innovative Architecture for Future Generation High Performance","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126490912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[2010] Facing the Exascale Energy Wall","authors":"P. Kogge, P. Fratta, Megan Vance","doi":"10.1109/IWIA.2010.9","DOIUrl":"https://doi.org/10.1109/IWIA.2010.9","url":null,"abstract":"A recent report focused on the technical challengesin advancing from today's \"petascale\" systems to \"exascale.\"Power, or more accurately energy, was a dominant challenge. This paper briefly reviews the energy challenge for exascaled sized systems, with an emphasis on the relatively enormous energy costs of referencing operands from the memory hierarchy. Then, usinga key step from the LINPACK benchmark, we investigate twodifferent approaches to reducing such costs: one which migratescomputations up from the host to higher levels of the hierarchy,and another in moving the whole computation closer to memory. Both show significant improvements over architecture as usual.","PeriodicalId":339844,"journal":{"name":"2010 International Workshop on Innovative Architecture for Future Generation High Performance","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129768622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[2009] Exploring the Possible Past Futures of a Single Part Type Multi-core PIM Chip","authors":"P. Kogge","doi":"10.1109/IWIA.2010.8","DOIUrl":"https://doi.org/10.1109/IWIA.2010.8","url":null,"abstract":"Execube, a chip built in 1993, was most probablythe world's first true multi-core microprocessor, the world's first Processing-In-Memory chip built on a DRAM process, and oneof the earliest attempts to build a single part type chip out ofwhich larger parallel processors could be built. This paper looksback on that chip and explores what would have happened ifits development had continued through succeeding generationsof technology. Several different scenarios are explored, withdiscussions as to what the capabilities would have been, andwhere limitations would have surfaced.","PeriodicalId":339844,"journal":{"name":"2010 International Workshop on Innovative Architecture for Future Generation High Performance","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121364598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[2009] An Instruction Decomposition Method for Reconfigurable Decoders","authors":"Kazuhiro Yoshimura, Takashi Nakada, Y. Nakashima","doi":"10.1109/IWIA.2010.12","DOIUrl":"https://doi.org/10.1109/IWIA.2010.12","url":null,"abstract":"Embedded multimedia processors are required to execute many kinds of traditional instruction sets. Since decomposition and translation of instructions by software emulators have larger overhead than that by hardware units, an IPC on software emulators is lower than that on real processors. In this paper, we propose a new method for executing many kinds of traditional instruction sets. The method decomposes them into internal instructions based on information from memory. The memory-based decoder decomposes target CISC instructions into simple instructions. We evaluate an instruction decomposition method and the memory-based decoders. The average IPC of a memory-based decoder is 0.53, which is six times higher than that on JIT type software emulators. The total memory size of the decoder is 98 KB. The chip area of the processor that has the decoder using RAM is 1.36 times larger than that with a hardwired decoder. Therefore, we conclude that the proposed method provides a good tradeoff between chip area and performance.","PeriodicalId":339844,"journal":{"name":"2010 International Workshop on Innovative Architecture for Future Generation High Performance","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123327372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonio Artés, F. Duarte, M. Ashouei, J. Huisken, J. Ayala, David Atienza Alonso, F. Catthoor
{"title":"[2010] Energy Efficiency Using Loop Buffer based Instruction Memory Organizations","authors":"Antonio Artés, F. Duarte, M. Ashouei, J. Huisken, J. Ayala, David Atienza Alonso, F. Catthoor","doi":"10.1109/IWIA.2010.10","DOIUrl":"https://doi.org/10.1109/IWIA.2010.10","url":null,"abstract":"Energy consumption in embedded systems is strongly dominated by instruction memory organizations. Based on this, any architectural enhancement introduced in this component will produce a significant reduction of the total energy bud-get of the system. Loop buffering is an effective scheme to reduce the energy consumption of the instruction memory organization.In this paper, a novel classification of architectural enhancements based on the use of loop buffer concept is presented. Using this classification, an energy design space exploration is performed to show the impact in the energy consumption on different application scenarios. From gate-level simulations, the energy analysis demonstrates that the instruction level parallelism of the system brings not only improvements in performance, but also improvements in the energy consumption of the system.The increase in instruction level parallelism makes easy the adaptation of the sizes of the loop buffers to the sizes of the loops that form the application, because gives more freedom to combine the execution of the loops that form the application.","PeriodicalId":339844,"journal":{"name":"2010 International Workshop on Innovative Architecture for Future Generation High Performance","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128132069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Malagón, Juan-Mariano de Goyeneche, Marina Zapater, Jose M. Moya
{"title":"[2010] Avoiding Side-Channel Attacks in Embedded Systems with Non-deterministic Branches","authors":"P. Malagón, Juan-Mariano de Goyeneche, Marina Zapater, Jose M. Moya","doi":"10.1109/IWIA.2010.14","DOIUrl":"https://doi.org/10.1109/IWIA.2010.14","url":null,"abstract":"In this paper, we suggest handling security in embedded systems by introducing a small architectural change. We propose the use of a non-deterministic branch instruction to generate non-determinism in the execution of encryption algorithms. Non-determinism makes side-channel attacks much more difficult. The experimental results show at least three orders of magnitude improvement in resistance to statistical side-channel attacks for a custom AES implementation, while enhancing its performance at the same time.Compared with previous countermeasures, this architectural-level hiding countermeasure is trivial to integrate in current embedded processor designs, offers similar resistance to side-channel attacks, while maintaining similar power consumption to the unprotected processor.","PeriodicalId":339844,"journal":{"name":"2010 International Workshop on Innovative Architecture for Future Generation High Performance","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122318070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marina Zapater, J. L. Risco-Martín, Z. Bankovic, J. Ayala, Jose M. Moya
{"title":"[2010] Combined Dynamic-Static Approach for Thermal-Awareness in Heterogeneous Data Centers","authors":"Marina Zapater, J. L. Risco-Martín, Z. Bankovic, J. Ayala, Jose M. Moya","doi":"10.1109/IWIA.2010.7","DOIUrl":"https://doi.org/10.1109/IWIA.2010.7","url":null,"abstract":"The thermal profile of data centers plays a significant role in affecting the cooling cost and power budget of the system. While several dynamic and static approaches have been proposed so far, these have failed on considering the whole picture. This paper proposes a combined static and dynamic approach that shows the benefits of the efficient scheduling strategies on leading to thermal-efficient floorplans. The devised methodology comes out with a placement of processors and task scheduling for a heterogeneous system, where the main thermal metrics (maximum temperature and thermal gradient) have been optimized.","PeriodicalId":339844,"journal":{"name":"2010 International Workshop on Innovative Architecture for Future Generation High Performance","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125342378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}