Masayuki Sato, Yusuke Tobo, Ryusuke Egawa, H. Takizawa, Hiroaki Kobayashi
{"title":"A flexible insertion policy for dynamic cache resizing mechanisms","authors":"Masayuki Sato, Yusuke Tobo, Ryusuke Egawa, H. Takizawa, Hiroaki Kobayashi","doi":"10.1109/CoolChips.2013.6547923","DOIUrl":"https://doi.org/10.1109/CoolChips.2013.6547923","url":null,"abstract":"This paper proposes a novel cache replacement policy named a flexible insertion policy (FLEXII) for dynamic cache resizing mechanisms. FLEXII can reduce the number of dead-on-fill blocks, which are never reused in a cache memory, and help the mechanisms further reduce the energy consumption. The experimental results indicate that FLEXII can reduce the energy consumption of the cache memory by up to 68%, and 27% on average without significant performance degradation.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128298115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Wang, Jun Yao, Youhui Zhang, Wei Xue, Y. Nakashima, Weimin Zheng
{"title":"HW/SW approaches to accelerate GRAPES in an FU array","authors":"Wei Wang, Jun Yao, Youhui Zhang, Wei Xue, Y. Nakashima, Weimin Zheng","doi":"10.1109/CoolChips.2013.6547920","DOIUrl":"https://doi.org/10.1109/CoolChips.2013.6547920","url":null,"abstract":"In this research, a high performance computing weather forecasting application GRAPES has been tuned onto a functional unit (FU) array based architecture. Software and hardware approaches are specifically employed to increase the data locality and data reuse to accelerate the stencil computation in GRAPES. The simulation results indicate that we can achieve a per-core average IPC of 12.3 within a 20-stage FU array processor, which has a 5.8x power-efficiency boost than the many-core processor (MCP) of a same process technology. This can accordingly slow down the increase of communication by one order in the cluster system, resulting in a 12x power-efficiency boost in all PEs.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126959684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Miura, Yusuke Koizumi, Eiichi Sasaki, Yasuhiro Take, Hiroki Matsutani, T. Kuroda, H. Amano, Ryuichi Sakamoto, M. Namiki, K. Usami, Masaaki Kondo, Hiroshi Nakamura
{"title":"A scalable 3D heterogeneous multi-core processor with inductive-coupling thruchip interface","authors":"N. Miura, Yusuke Koizumi, Eiichi Sasaki, Yasuhiro Take, Hiroki Matsutani, T. Kuroda, H. Amano, Ryuichi Sakamoto, M. Namiki, K. Usami, Masaaki Kondo, Hiroshi Nakamura","doi":"10.1109/CoolChips.2013.6547916","DOIUrl":"https://doi.org/10.1109/CoolChips.2013.6547916","url":null,"abstract":"A scalable heterogeneous multi-core processor is developed. 3D heterogeneous chip stacking of a general-purpose CPU and reconfigurable multi-core accelerators improves computational energy efficiency by proper task assignment and massive parallel computing. The stacked chips interconnect through a scalable 3D Network on Chip (NoC). By simply changing the number of stacked accelerator chips, processor parallelism can be widely scaled. In combination with Dynamic Voltage and Frequency Scaling (DVFS), the energy efficiency can be optimized for various performance requirements. No design change is needed, and hence no additional Non-Recurring Engineering (NRE) cost. An inductive-coupling ThruChip Interface (TCI) is applied to stacked-chip communications, forming a low-cost and robust high-speed 3D NoC. A prototype demonstration system has been developed with 65nm CMOS test chips. Successful system operations including 10-hours continuous Linux OS operation are confirmed for the first time.","PeriodicalId":340576,"journal":{"name":"2013 IEEE COOL Chips XVI","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126162864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}