{"title":"A low power DRAM refresh control scheme for 3D memory cube","authors":"Ying Wang, Yinhe Han, Huawei Li","doi":"10.1109/CoolChips.2014.6842950","DOIUrl":"https://doi.org/10.1109/CoolChips.2014.6842950","url":null,"abstract":"We propose a low power refresh control scheme for 3D stacked DRAM memory, which leverages the data-pattern dependence characteristics of the cells' Retention-Time to squeeze the margin of refresh interval. It is a systematic approach that uses our proposed Retention-Time (RT) detection mechanism to capture the bottleneck that contributes to over-frequent refresh operations: “weak” cells with relatively shorter Retention-Time than others. With the help of memory scrubbers and Error Correction Pointer (ECP) table integrated on logic base of 3D memory cube, we can avoid the worst-case operation by locating the true “weak” cells sensitized by application and adapting the refresh rate to the data layout under our loop-based control algorithm. As shown in experiments, the method dramatically saves memory energy and bandwidth consumption.","PeriodicalId":366328,"journal":{"name":"2014 IEEE COOL Chips XVII","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121825062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun'ichi Segawa, Yusuke Shirota, K. Fujisaki, Tetsuro Kimura, Tatsunori Kanai
{"title":"Aggressive use of Deep Sleep mode in low power embedded systems","authors":"Jun'ichi Segawa, Yusuke Shirota, K. Fujisaki, Tetsuro Kimura, Tatsunori Kanai","doi":"10.1109/CoolChips.2014.6842956","DOIUrl":"https://doi.org/10.1109/CoolChips.2014.6842956","url":null,"abstract":"Since idle-state is the dominant state for embedded systems, disabling unused devices in idle-states can lead to significant power reduction. Among the various sleep modes provided by application processors, Deep Sleep mode offers maximum power savings. Since Deep Sleep mode requires to stop I/O devices and clocks, it is usually used in suspend-state. However, with the emergence of non-volatile or low power compute state retainable devices, we can now explore exploiting Deep Sleep mode in non-suspend states. We propose a new scheme to aggressively use Deep Sleep mode under normal operations. An experimental result of 48-80% power reduction on our prototype board indicates possibilities for near-future mobile platform running solely on photovoltaic-power.","PeriodicalId":366328,"journal":{"name":"2014 IEEE COOL Chips XVII","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130431092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ye Gao, Masayuki Sato, Ryusuke Egawa, H. Takizawa, Hiroaki Kobayashi
{"title":"An energy optimization method for vector processing mechanisms","authors":"Ye Gao, Masayuki Sato, Ryusuke Egawa, H. Takizawa, Hiroaki Kobayashi","doi":"10.1109/CoolChips.2014.6842957","DOIUrl":"https://doi.org/10.1109/CoolChips.2014.6842957","url":null,"abstract":"In order to achieve a low energy execution for any multimedia applications (MMAs) on a vector processing mechanism (VPM), the number of parallel arithmetic pipelines and the number of cache ports of VPM must be properly configured for each MMAs. Therefore, this paper proposes an energy optimization method for VPMs (EOM-VP), which finds the lowest energy configuration by using the greedy searching method and an analytical model. As the evaluation results suggest, EOM-VP could find the lowest or the second lowest energy configuration for all the benchmark programs in the evaluation.","PeriodicalId":366328,"journal":{"name":"2014 IEEE COOL Chips XVII","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128817597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fine grained power management supported by just-in-time compiler","authors":"Motoki Wada, Mikiko Sato, M. Namiki","doi":"10.1109/CoolChips.2014.6842958","DOIUrl":"https://doi.org/10.1109/CoolChips.2014.6842958","url":null,"abstract":"A low-power computing is now on high demand for both high performance computing and mobile computing. This research suggests the framework for controlling finely grained power saving hardware such as power gating, based on on-time analysis supported by JIT Compiler. By adapting the framework to control over fine-grained power gating control, the authors have succeeded to reduce the leakage power of processor by the maximum of 22%, and the average of 6%.","PeriodicalId":366328,"journal":{"name":"2014 IEEE COOL Chips XVII","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128132770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Establishing a standard interface between multi-manycore and software tools - SHIM","authors":"Masaki Kondo, F. Arakawa, M. Edahiro","doi":"10.1109/CoolChips.2014.6842946","DOIUrl":"https://doi.org/10.1109/CoolChips.2014.6842946","url":null,"abstract":"The multicore processors are becoming norm and a processor with even more than a hundred of cores are emerging. These inherently require wide range of software tools to help software developers. However, supporting these complex hardware by the tools require significant effort by the tool vendors, and each invest in adapting the new hardware by modifying their tools or creating proprietary configuration files, while often the similar set of hardware architectural information are needed. The SHIM, Software-Hardware Interface for Multi-many-core, is a joint industrial and academic effort to standardize the interface between the multicore hardware and the software tools. This extended abstract introduces SHIM, the overall architecture, the schema used, the use-cases, and a prototype tool to foster the adaption of the interface.","PeriodicalId":366328,"journal":{"name":"2014 IEEE COOL Chips XVII","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114895174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Yao, Y. Nakashima, Mitsutoshi Saito, Yohei Hazama, Ryosuke Yamanaka
{"title":"A flexibly fault-tolerant FU array processor and its self-tuning scheme to locate permanently defective unit","authors":"Jun Yao, Y. Nakashima, Mitsutoshi Saito, Yohei Hazama, Ryosuke Yamanaka","doi":"10.1109/CoolChips.2014.6842951","DOIUrl":"https://doi.org/10.1109/CoolChips.2014.6842951","url":null,"abstract":"In this work, we propose the Explicit Redundancy Linear Array (EReLA) architecture to provide a highly flexible fault-toleration, which effectively utilizes its rich resources in a functional unit (FU) array for both the error detection and the fail-safe hot-swap after taking a permanent fault. For the preparation of the hot-swap, a self-tuning scheme is proposed specifically to fast locate the precise position of the permanently defective units, which can be either the computational, LD/ST FUs, or the connecting network as well. EReLA can thereby isolates the permanently defective unit at the smallest granularity, which allows more hot-swaps and extends accordingly the lifespan of the whole processor. Given these schemes, EReLA is functionally same to a traditional TMR processor in terms of fault toleration, while the power data of a 180nm prototype EReLA chip has indicated that it incurs far less power consumption than the TMR implementation.","PeriodicalId":366328,"journal":{"name":"2014 IEEE COOL Chips XVII","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122003582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Language runtime support for NVM/DRAM hybrid main memory","authors":"Gaku Nakagawa, S. Oikawa","doi":"10.1109/CoolChips.2014.6842949","DOIUrl":"https://doi.org/10.1109/CoolChips.2014.6842949","url":null,"abstract":"Replacing of DRAM in main memory with non-volatile memory (NVM) has several merits. However, NVM under development has some limitations in write operation. To overcome it, some previous researches proposed NVM/DRAM hybrid memory architecture. In the architecture, it needs to determine data placements between NVM and DRAM. In this paper, we advocate that programming language runtimes are useful for management of NVM/DRAM hybrid main memory. In addition, we will propose a method to manage NVM/DRAM hybrid main memory with language runtime support.","PeriodicalId":366328,"journal":{"name":"2014 IEEE COOL Chips XVII","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130641500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunyun Jiang, Yi Yang, Tian Xiao, Tianwei Sheng, Wenguang Chen
{"title":"Kernel data race detection using debug register in Linux","authors":"Yunyun Jiang, Yi Yang, Tian Xiao, Tianwei Sheng, Wenguang Chen","doi":"10.1109/CoolChips.2014.6842953","DOIUrl":"https://doi.org/10.1109/CoolChips.2014.6842953","url":null,"abstract":"Data races in parallel programs are notoriously difficult to detect and resolve. Existing research has mostly focused on data race detection at the user level and significant progress has been made in this regard. It is difficult to apply detection methods designed for user-level applications to identify OS kernel level races. In this paper, we present a new detection tool that is able to effectively detect race conditions in the Linux kernel environment. We use a dynamic detection approach, employing hardware debug registers available on commodity processors, to catch races on the fly during runtime. Preliminary experimental results show that our tool can effectively identify real data race instances.","PeriodicalId":366328,"journal":{"name":"2014 IEEE COOL Chips XVII","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129696209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Ishibashi, N. Sugii, K. Usami, H. Amano, Kazutoshi Kobayashi, C. Pham, H. Makiyama, Yoshiki Yamamoto, H. Shinohara, T. Iwamatsu, Y. Yamaguchi, H. Oda, T. Hasegawa, S. Okanishi, H. Yanagita, S. Kamohara, M. Kadoshima, K. Maekawa, T. Yamashita, Duc-Hung Le, T. Yomogita, M. Kudo, K. Kitamori, Shuya Kondo, Yuuki Manzawa
{"title":"A Perpetuum Mobile 32bit CPU with 13.4pJ/cycle, 0.14µA sleep current using Reverse Body Bias Assisted 65nm SOTB CMOS technology","authors":"K. Ishibashi, N. Sugii, K. Usami, H. Amano, Kazutoshi Kobayashi, C. Pham, H. Makiyama, Yoshiki Yamamoto, H. Shinohara, T. Iwamatsu, Y. Yamaguchi, H. Oda, T. Hasegawa, S. Okanishi, H. Yanagita, S. Kamohara, M. Kadoshima, K. Maekawa, T. Yamashita, Duc-Hung Le, T. Yomogita, M. Kudo, K. Kitamori, Shuya Kondo, Yuuki Manzawa","doi":"10.1109/CoolChips.2014.6842954","DOIUrl":"https://doi.org/10.1109/CoolChips.2014.6842954","url":null,"abstract":"A 32-bit CPU which operates with the lowest energy of 13.4 pJ/cycle at 0.35V and 14MHz, operates at 0.22V to 1.2V and with 0.14μA sleep current is demonstrated. The low power performance is attained by Reverse-Body-Bias-Assisted 65nm SOTB CMOS (Silicon On Thin Buried oxide) technology. The CPU can operate more than 100 years with 610mAH Li battery.","PeriodicalId":366328,"journal":{"name":"2014 IEEE COOL Chips XVII","volume":"128 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133256260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A globally asynchronous locally synchronous DMR architecture for aggressive low-power fault toleration","authors":"Yuttakon Yuttakonkit, Jun Yao, Y. Nakashima","doi":"10.1109/CoolChips.2014.6842952","DOIUrl":"https://doi.org/10.1109/CoolChips.2014.6842952","url":null,"abstract":"Recently, dual or triple modular redundancy (DMR/TMR) has been commonly used in high-end server or special environment targeted microprocessors to mitigate single event effects (SEEs), as the miniaturized transistors tend to be more vulnerable to SEEs. However, facing the issue that DMR and TMR usually add remarkable pressures to the power consumption due to the highly redundant executions, this work specially provides an architectural solution to introduce aggressive dynamic voltage scaling (DVS) and Razor-FF on DMR architecture to moderate the total energy. As the traditional DMR architecture with a globally synchronous clock will have visible performance down-gradation when DVS and Razor-FF are used, in this work, we propose a DMR processor architecture that uses dedicated clocks on each DMR module, following a globally asynchronous locally synchronous (GALS) execution fashion. In the execution, due to the possible timing faults from the aggressively lowered voltage, the two modules may experience a dynamically phase-shift clock frequency. Our GALS DMR approach is assembled with FIFOs and delay buffers to conceal the effect from this phase-shift and thereby the performance impact is largely alleviated. Compared to the traditional synchronous DMR system, we can have around 10% performance improvement by this asynchronous scheme when a same power reduction ratio is assumed. Also, we have aggressively turned down the voltage and achieved a 12% better MIPS/W than the previous DMR without major performance influence.","PeriodicalId":366328,"journal":{"name":"2014 IEEE COOL Chips XVII","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123479339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}