Hiroyuki Akasaka, Shin-ya Abe, M. Yanagisawa, N. Togawa
{"title":"Energy-efficient High-level Synthesis for HDR Architectures with Clock Gating Based on Concurrency-oriented Scheduling","authors":"Hiroyuki Akasaka, Shin-ya Abe, M. Yanagisawa, N. Togawa","doi":"10.2197/ipsjtsldm.6.101","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.6.101","url":null,"abstract":"With the miniaturization of LSIs and its increasing performance, demand for high-functional portable devices has grown significantly. At the same time, battery lifetime and device overheating are leading to major design problems hampering further LSI integration. On the other hand, the ratio of an interconnection delay to a gate delay has continued to increase as device feature size decreases. We have to estimate interconnection delays and reduce energy consumption even in a high-level synthesis stage. In this paper, we propose a high-level synthesis algorithm for huddle-based distributed-register architectures (HDR architectures) with clock gatings based on concurrency-oriented scheduling/functional unit binding. We assume coarse-grained clock gatings to huddles and we focus on the number of control steps, or gating steps, at which we can apply the clock gating to registers in every huddle. We propose two methods to increase gating steps: One is that we try to schedule and bind operations to be performed at the same timing. By adjusting the clock gating timings in a high-level synthesis stage, we expect that we can enhance the effect of clock gatings more than applying clock gatings after logic synthesis. The other is that we try to synthesize huddles such that each of the synthesized huddles includes registers which have similar or the same clock gating timings. At this time, we determine the clock gating timings to minimize all energy consumption including clock tree energy. The experimental results show that our proposed algorithm reduces energy consumption by a maximum of 23.8% compared with several conventional algorithms.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"47 1","pages":"101-111"},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91147408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Method to Reduce Energy Consumption of Conditional Operations with Execution Probabilities","authors":"Kazuhito Ito, Kazuhiko Kameda","doi":"10.2197/ipsjtsldm.6.60","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.6.60","url":null,"abstract":"In conditional processing, operations are executed conditionally based on the result of condition operations. While the speculative execution of conditional operations achieves higher processing speed, unnecessary energy may be consumed by the speculatively executed operations. In this paper, reduction of the energy consumption of conditional processing is considered for time and resource constrained processing. An efficient method to calculate the probability of operation execution is presented. Based on the probabilities of execution, a scheduling exploration with the simulated annealing and a heuristic scheduling algorithm are proposed to minimize the energy consumption of the conditional processing by reducing unnecessary speculative operations. The experimental results show 5% to 10% energy can be reduced by the proposed methods for the same configuration of resources.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"80 1","pages":"60-70"},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85353592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Loop Fusion with Outer Loop Shifting for High-level Synthesis","authors":"Y. Kato, Kenshu Seto","doi":"10.2197/ipsjtsldm.6.71","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.6.71","url":null,"abstract":"Loop fusion is often necessary before successful application of high-level synthesis (HLS). Although promising loop optimization tools based on the polyhedral model such as Pluto have been proposed, they sometimes cannot fuse loops into fully nested loops. This paper proposes an effective loop transformation called Outer Loop Shifting (OLS) that facilitates successful loop fusion. With HLS, we found that the OLS generates hardware with 25% less execution cycles on average than that only by Pluto for four benchmark programs.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"36 1","pages":"71-75"},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88860935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kosuke Mizuno, Yosuke Terachi, Kenta Takagi, S. Izumi, H. Kawaguchi, M. Yoshimoto
{"title":"An FPGA Implementation of a HOG-based Object Detection Processor","authors":"Kosuke Mizuno, Yosuke Terachi, Kenta Takagi, S. Izumi, H. Kawaguchi, M. Yoshimoto","doi":"10.2197/ipsjtsldm.6.42","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.6.42","url":null,"abstract":"This paper describes a Histogram of Oriented Gradients (HOG)-based object detection processor. It features a simplified HOG algorithm with cell-based scanning and simultaneous Support Vector Machine (SVM) calculation, cell-based pipeline architecture, and parallelized modules. To evaluate the effectiveness of our approach, the proposed architecture is implemented onto a FPGA prototyping board. Results show that the proposed architecture can generate HOG features and detect objects with 40 MHz for SVGA resolution video (800 × 600 pixels) at 72 frames per second (fps).","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"22 1","pages":"42-51"},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75285163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Nakaya, M. Miyamura, N. Sakimura, Yuichi Nakamura, T. Sugibayashi
{"title":"A non-volatile reconfigurable offloader for wireless sensor nodes","authors":"S. Nakaya, M. Miyamura, N. Sakimura, Yuichi Nakamura, T. Sugibayashi","doi":"10.1145/2460216.2460232","DOIUrl":"https://doi.org/10.1145/2460216.2460232","url":null,"abstract":"Energy saving is currently one of the most important issues in the development of battery-powered wireless sensor nodes (WSNs). We have developed a non-volatile reconfigurable offioader for flexible and highly efficient processing on WSNs that uses NanoBridges (NBs), which are novel non-volatile and reprogrammable switching elements. Non-volatility is essential for the intermittent operation of WSNs due to the requirement of power-on without loading configuration data. We implemented a data compression algorithm on the offioader that reduces energy consumption during data transmission. Simulation results showed that the energy consumption on the offioader was 1121 of that on an ultra-low power cpu.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"75 1","pages":"52-59"},"PeriodicalIF":0.0,"publicationDate":"2012-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89903111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Xiao, T. Isshiki, Dongju Li, H. Kunieda, Yuko Nakase, Sadahiro Kimura
{"title":"Optimized Communication and Synchronization for Embedded Multiprocessors Using ASIP Methodology","authors":"Hao Xiao, T. Isshiki, Dongju Li, H. Kunieda, Yuko Nakase, Sadahiro Kimura","doi":"10.2197/ipsjtsldm.5.118","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.5.118","url":null,"abstract":"Inter-processor communication and synchronization are critical problems in embedded multiprocessors. In order to achieve high-speed communication and low-latency synchronization, most recent designs employ dedicated hardware engines to support these communication protocols individually, which is complex, inflexible, and error prone. Thus, this paper motivates the optimization of inter-processor communication and synchronization by using application-specific instruction-set processor (ASIP) techniques. The proposed communication mechanism is based on a set of custom instructions coupled with a low-latency on-chip network, which provides efficient support for both data transfer and process synchronization. By using state-of-the-art ASIP design methodology, we embed the communication functionalities into a base processor, making the proposed mechanism feature ultra low overhead. More importantly, industry-standard compatible programming interfaces supporting both message-passing and shared-memory paradigms are exposed to end-users to ease the software porting. Experimental results show that the bandwidth of the proposed message-passing protocol can achieve up to 703 Mbyte/s @ 200 MHz, and the latency of the proposed synchronization protocol can be reduced by more than 81% when compared with the conventional approach. Moreover, as a case study, we also show the effectiveness of the proposed communication mechanism in a real-life embedded application, WiMedia UWB MAC.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"47 1","pages":"118-132"},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78389873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Exact Estimation Algorithm of Error Propagation Probability for Sequential Circuits","authors":"Masayoshi Yoshimura, Y. Akamine, Y. Matsunaga","doi":"10.2197/ipsjtsldm.5.63","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.5.63","url":null,"abstract":"In advanced integrated circuit technology, the soft error tolerance is low. Soft errors ultimately lead to failure in VLSIs. We propose a method for the exact estimation of error propagation probabilities in sequential circuits whose FFs latch failure values. The failure due to soft errors in sequential circuits is defined using the modified product machine. The modified product machine monitors whether failure values appear at any primary output. The behavior of the modified product machine is analyzed with the Markov model. The probabilities that the failure values latched into the flip-flops (FFs) appear at any primary output are calculated from the state transition probabilities of the modified product machine. The time required for solving simultaneous linear equations accounts for a large portion of the execution time. We also propose two acceleration techniques to enable the application of our estimation method to larger scale circuits. These acceleration techniques reduce the number of variables in simultaneous linear equations. We apply the proposed method to ISCAS'89 and MCNC benchmark circuits and estimate error propagation probabilities for sequential circuits. Experimental results show that total execution times for the proposed method with two acceleration techniques are up to 10 times lesser than the total execution times for a naive implementation.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"1 1","pages":"63-70"},"PeriodicalIF":0.0,"publicationDate":"2012-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77311471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-efficient High-level Synthesis for HDR Architectures","authors":"Shin-ya Abe, M. Yanagisawa, N. Togawa","doi":"10.2197/ipsjtsldm.5.106","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.5.106","url":null,"abstract":"","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"23 1","pages":"106-117"},"PeriodicalIF":0.0,"publicationDate":"2012-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90376999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Stackable LTE Chip for Cost-effective 3D Systems","authors":"W. Lafi, D. Lattard, A. Jerraya","doi":"10.2197/ipsjtsldm.5.2","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.5.2","url":null,"abstract":"To address the problem of prohibitive cost of advanced fabrication technologies, one solution consists in reusing masks to address a wide range of ICs. This could be achieved by a modular circuit that can be stacked to build TSV-based 3D systems with processing performance adapted to several applications. This paper focuses on 4G wireless telecom applications. We propose a basic circuit that meets the SISO (Single Input Single Output) transmission mode. By stacking multiple instances of this same circuit, it will be possible to address several MIMO (Multiple Input Multiple Output) modes. The proposed circuit is composed of several processing units interconnected by a 3D NoC and controlled by a host processor. Compared to a 2D reference platform, the proposed circuit keeps at least the same performance and power consumption in the context of 4G telecom applications, while reducing total cost. More generally, our cost analysis shows that 3D integration efficiency depends on the size of the circuit and the stacking option (die-to-die, die-to-wafer and interposer-based stacking).","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"63 1","pages":"2-13"},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79650775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sungho Park, Ahmed Al-Maashri, K. Irick, A. Chandrashekhar, M. Cotter, Nandhini Chandramoorthy, M. DeBole, N. Vijaykrishnan
{"title":"System-On-Chip for Biologically Inspired Vision Applications","authors":"Sungho Park, Ahmed Al-Maashri, K. Irick, A. Chandrashekhar, M. Cotter, Nandhini Chandramoorthy, M. DeBole, N. Vijaykrishnan","doi":"10.2197/ipsjtsldm.5.71","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.5.71","url":null,"abstract":"Neuromorphic vision algorithms are biologically-inspired computational models of the primate visual pathway. They promise robustness, high accuracy, and high energy efficiency in advanced image processing applications. Despite these potential benefits, the realization of neuromorphic algorithms typically exhibit low performance even when executed on multi-core CPU and GPU platforms. This is due to the disparity in the computational modalities prominent in these algorithms and those modalities most exploited in contemporary computer architectures. In essence, acceleration of neuromorphic algorithms requires adherence to specific computational and communicational requirements. This paper discusses these requirements and proposes a framework for mapping neuromorphic vision applications on a System-on-Chip, SoC. A neuromorphic object detection and recognition on a multi-FPGA platform is presented with performance and power efficiency comparisons to CMP and GPU implementations.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"54 1","pages":"71-95"},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73513793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}