Hao Zhang, Hiroki Matsutani, M. Koibuchi, H. Amano
{"title":"Dynamic power on/off method for 3D NoCs with wireless inductive-coupling links","authors":"Hao Zhang, Hiroki Matsutani, M. Koibuchi, H. Amano","doi":"10.1109/CoolChips.2013.6547924","DOIUrl":"https://doi.org/10.1109/CoolChips.2013.6547924","url":null,"abstract":"Network-on-Chips (NoCs) with wireless inductive coupling have been utilized in real heterogeneous multicore systems. Although the inductive-coupling itself is energy-efficient (e.g., 0.14pJ per bit [1]), inductors continuously consume a certain amount of power, regardless of packet transfers. That is, inductors waste significant power especially when the utilization of vertical links (i.e., inductors) is low, which is a typical use case of 3-D ICs that the most communications are within a chip while the communications between chips are infrequent. Such power can be reduced by shutting down the link by controlling bias voltage of transistors used in the transmitter and receiver. Here, we propose generalized link on-off techniques for wireless NoCs with irregular network topologies. The simulation shows that the proposed low-power techniques reduce the power consumption by 43.8%-55.0%.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124679010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Architecture level TSV count minimization methodology for 3D tree-based FPGA","authors":"V. Pangracious, H. Mehrez, Z. Marrakchi","doi":"10.1109/CoolChips.2013.6547925","DOIUrl":"https://doi.org/10.1109/CoolChips.2013.6547925","url":null,"abstract":"The CMOS technology scaling has greatly improved the overall performance and density of Field Programmable Gate Array (FPGA), nonetheless the performance gap between FPGA and ASIC has remain very wide mainly due the programming overhead of FPGA. Three-Dimensional (3D) integration is a promising technology to reduce wire lengths. Through Silicon Vias (TSV) provide electrical connectivity between multiple active device planes in 3D integrated Circuits (ICs). TSVs require a significant silicon area compared to planar interconnects and also bring critical challenges to design of 3D ICs. In this paper we propose an architectural level TSV count optimization solution to minimize the TSV count without compromising the chip performance. The experimental results shows we are able to minimize 40% of TSV count in 3D Tree-based FPGA.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"46 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120921640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Ishizaka, Takamichi Miyamoto, S. Akimoto, A. Iketani, T. Hosomi, J. Sakai
{"title":"Power efficient realtime super resolution by virtual pipeline technique on a server with manycore coprocessors","authors":"K. Ishizaka, Takamichi Miyamoto, S. Akimoto, A. Iketani, T. Hosomi, J. Sakai","doi":"10.1109/CoolChips.2013.6547918","DOIUrl":"https://doi.org/10.1109/CoolChips.2013.6547918","url":null,"abstract":"Super Resolution image processing (SR) is a heavy task for a today's mid-range Xeon server. To accelerate SR, we utilize a server system with manycore coprocessor, Intel Xeon Phi coprocessor. Function offload model is a usual execution model for those systems. However it is difficult for SR to increase utilization of both host processors and coprocessors by the model. We propose a virtual pipeline model which can fully utilize both processors. Experimental results show that our SR improves performance 3.3 times and performance/watt 1.5 times. Our SR achieves 30 frames per sec from SD to HD.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114543905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Masubuchi, B. Gyselinckx, M. McCool, S. Momose, James Myers, Toshio Yoshida
{"title":"Panel discussions the next step in processor evolution","authors":"Y. Masubuchi, B. Gyselinckx, M. McCool, S. Momose, James Myers, Toshio Yoshida","doi":"10.1109/CoolChips.2013.6547915","DOIUrl":"https://doi.org/10.1109/CoolChips.2013.6547915","url":null,"abstract":"Processor performance and functional improvement has been driven by innovations in various areas, such as architecture, circuit, device, and software, but we are now facing or will face hard challenges, such as power consumption and process technology. What would the next step be in processor evolution? This panel will discuss with experts from HPC, server and embedded area what the requirements of the processors would be for future applications, what technology would give solutions to them, and what the challenges would be.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122910365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dominic Hillenbrand, Akihiro Hayashi, Hideo Yamamoto, K. Kimura, H. Kasahara
{"title":"Automatic parallelization, performance predictability and power control for mobile-applications","authors":"Dominic Hillenbrand, Akihiro Hayashi, Hideo Yamamoto, K. Kimura, H. Kasahara","doi":"10.1109/CoolChips.2013.6547919","DOIUrl":"https://doi.org/10.1109/CoolChips.2013.6547919","url":null,"abstract":"Currently few mobile applications exploit the power- and performance capabilities of multi-core architectures. As the number of cores increases, the challenges become more pressing. We picked three challenges: application parallelization, performance-predictability/portability and power control for mobile devices. We tackled the challenges with our auto-parallelizing compiler and operating system enhancements.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115526724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junyoung Park, Injoon Hong, Gyeonghoon Kim, Youchang Kim, K. Lee, Seongwook Park, Kyeongryeol Bong, H. Yoo
{"title":"A multi-granularity parallelism object recognition processor with content-aware fine-grained task scheduling","authors":"Junyoung Park, Injoon Hong, Gyeonghoon Kim, Youchang Kim, K. Lee, Seongwook Park, Kyeongryeol Bong, H. Yoo","doi":"10.1109/CoolChips.2013.6547917","DOIUrl":"https://doi.org/10.1109/CoolChips.2013.6547917","url":null,"abstract":"Multiple granularity parallel core architecture is proposed to accelerate object recognition with low area and energy consumption. By adopting task-level optimized cores with different parallelism and complexity, the proposed processor achieves real-time object recognition with 271.4 GOPS peak performance. In addition, content-aware fine-grained task scheduling is proposed to enable low power real-time object recognition on 30fps 720p HD video streams. As a result, the object recognition processor achieves 9.4nJ/pixel energy efficiency and 25.8 GOPS/W·mm2 power-area efficiency in O.13um CMOS technology.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115206398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RXv2 processor core for low-power microcontrollers","authors":"S. Otani, N. Ishikawa, H. Kondo","doi":"10.1109/CoolChips.2013.6547914","DOIUrl":"https://doi.org/10.1109/CoolChips.2013.6547914","url":null,"abstract":"We have developed a new processor architecture for microcontrollers which integrate high-capacity FLASH memory and many peripheral functional modules. This paper describes processor core architecture for low-power microcontrollers and our approach for reducing energy consumption with instruction fetch mechanisms. A large fraction of the total power budget of the microcontroller is the energy consumption in the path from the FLASH memory to the processor. An enhanced instruction set and pipeline structure provide an effective balance between high code density, power consumption performance and high processing performance with an novel prefetching unit to reduce the number of memory accesses.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116468828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yohei Kanehagi, Dan Umeda, Akihiro Hayashi, K. Kimura, H. Kasahara
{"title":"Parallelization of automotive engine control software on embedded multi-core processor using OSCAR compiler","authors":"Yohei Kanehagi, Dan Umeda, Akihiro Hayashi, K. Kimura, H. Kasahara","doi":"10.1109/CoolChips.2013.6547921","DOIUrl":"https://doi.org/10.1109/CoolChips.2013.6547921","url":null,"abstract":"The next-generation automobiles are required to be more safe, comfortable and energy-efficient. These requirements can be realized by integrated control systems with enhanced electric control units, or real-time control system such as engine control and enhanced information system such as human and other cars recognition, navigations considering traffic conditions including the occasions of natural disasters. For example, sophisticating engine control algorithms requires performance enhancement of microprocessors to satisfy real-time constraints. Use of multi-core processors is a promising approach to realize the next-generation automobiles integrated control system. In terms of multi-core processors in the automotive control, the previous works include improvements of reliability by performing redundant calculation [1] and improvements of throughput by functional distribution [2] rather than improvement of response time, or performance by parallel processing. To the best of our knowledge, parallel processing of the automotive control software to reduce response time has not been succeeded on multi-core processors because the program consists of conditional branches and small basic blocks. On the other hand, this paper is the first paper has successfully parallelized the practical automotive engine control software using automatic multigrain parallelizing compiler, or the OSCAR Compiler has been developed by the authors for more than 25 years. The OSCAR compiler parallelizes automotive programs by utilizing coarse grain task parallelism with newly developed parallelism enhanced methods like the branch duplication instead of loop parallelism. Performance of the hand-written engine control programs provided by Toyota Motor Corp. on the RP-X having eight SH4A processor cores developed by Renesas, Hitachi, Tokyo Institute of technology and Waseda University is evaluated. The evaluation shows speedups of 1.54 times with 2 processor cores compared with the case of an ordinary sequential execution.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125898809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hidetomo Kobayashi, K. Kato, Takuro Ohmaru, S. Yoneda, T. Nishijima, Shuhei Maeda, K. Ohshima, H. Tamura, Hiroyuki Tomatsu, T. Atsumi, Y. Shionoiri, Y. Maehashi, J. Koyama, S. Yamazaki
{"title":"Processor with 4.9-μs break-even time in power gating using crystalline In-Ga-Zn-oxide transistor","authors":"Hidetomo Kobayashi, K. Kato, Takuro Ohmaru, S. Yoneda, T. Nishijima, Shuhei Maeda, K. Ohshima, H. Tamura, Hiroyuki Tomatsu, T. Atsumi, Y. Shionoiri, Y. Maehashi, J. Koyama, S. Yamazaki","doi":"10.1109/CoolChips.2013.6547913","DOIUrl":"https://doi.org/10.1109/CoolChips.2013.6547913","url":null,"abstract":"A processor having a power management unit (PMU) and an 8-bit CPU including flip-flops with shadow memories is fabricated by 0.5-μm Si and 0.8-μm c-axis-aligned crystalline In-Ga-Zn-oxide (CAAC-IGZO) technology. The shadow memories hold data without power supply utilizing low off-state current of CAAC-IGZO FETs. A break-even time (BET) of 4.9μs has been obtained. Good scalability of the processor in writing data to shadow memories and in area (5.7% overhead or less) is also confirmed through simulation and layout, based on flip-flops using 30-nm Si FETs combined with 0.3-μm CAAC-IGZO FETs which show good electronic characteristics and no overhead in area.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131285679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tetsuro Honmura, Yuki Kondo, Tetsuya Yamada, M. Takada, Takumi Nitoh, Tohru Nojiri, Keisuke Toyama, Yasuhiko Saitoh, H. Nishi, Mikiko Sato, M. Namiki
{"title":"Hardware support for resource partitioning in real-time embedded systems","authors":"Tetsuro Honmura, Yuki Kondo, Tetsuya Yamada, M. Takada, Takumi Nitoh, Tohru Nojiri, Keisuke Toyama, Yasuhiko Saitoh, H. Nishi, Mikiko Sato, M. Namiki","doi":"10.1109/CoolChips.2013.6547922","DOIUrl":"https://doi.org/10.1109/CoolChips.2013.6547922","url":null,"abstract":"Today's embedded systems require multiple functions such as real-time control and information technology and integrating these functions on a multi-core processor is one effective solution. However, this increases overhead as it is necessary to partition resources in this approach to protect them. We developed hardware support called ExVisor/XVS to reduce the overhead of partitioning resources to achieve real-time characteristics. This features a physical address management module (PAM) that uses direct address translation by using a single level page table based on an embedded system's memory usage. We evaluated the overhead in a virtual machine's (VM) resource access through register transfer level (RTL) simulation and implementation on a field-programmable gate array (FPGA), and it was only less than 5.6% compared with the resource access time by a single core processor.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126205614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}