2015 33rd IEEE International Conference on Computer Design (ICCD)最新文献_第3页

Improving the interface performance of synthesized structural FAME simulators through scheduling 通过调度提高合成结构FAME模拟器的界面性能

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357086

D. Penry

引用次数: 1

Runtime multi-optimizations for energy efficient on-chip interconnections1 运行时多优化节能片上互连1

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357147

Yuan He, Masaaki Kondo, Takashi Nakada, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura

{"title":"Runtime multi-optimizations for energy efficient on-chip interconnections1","authors":"Yuan He, Masaaki Kondo, Takashi Nakada, Hiroshi Sasaki, Shinobu Miwa, Hiroshi Nakamura","doi":"10.1109/ICCD.2015.7357147","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357147","url":null,"abstract":"On-chip interconnection (or NoC) is a major performance and power contributor to modern and future multicore processors. So far, many optimization techniques have been developed to improve its bandwidth, latency and power consumption. But it is not clear how energy efficiency is affected since an optimization technique normally comes with overheads. This paper thus attempts to address when and how such optimization techniques should be applied and tuned to help achieve better energy efficiency. We firstly model the performance and energy impacts of representative NoC optimization techniques. These models help us more easily understand the consequences when applying these optimization techniques and their combinations under different circumstances. Moreover, based on such modeling, we propose and implement an adaptive control over these NoC optimization techniques to improve both performance and energy efficiency of the network. Our results show that, this proposal can achieve an average improvement of 26% and 57% on network performance and energy delay product, respectively.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132351555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

FDRAM: DRAM architecture flexible in successive row and column accesses FDRAM:在连续行和列访问方面灵活的DRAM架构

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357146

Jeongjae Yu, Wooyoung Jang

引用次数: 2

Exploring multiple sleep modes in on/off based energy efficient HPC networks 探索基于开/关的高能效HPC网络中的多种睡眠模式

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357084

Karthikeyan P. Saravanan, P. Carpenter, Alex Ramírez

{"title":"Exploring multiple sleep modes in on/off based energy efficient HPC networks","authors":"Karthikeyan P. Saravanan, P. Carpenter, Alex Ramírez","doi":"10.1109/ICCD.2015.7357084","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357084","url":null,"abstract":"Energy efficiency is one of the key challenges in high-performance computing (HPC). The current target of 1 ExaFlop in 20 MW requires a ten-fold improvement in energy efficiency, which is only possible through significant improvements in the energy efficiency throughout the system. Interconnects are particularly inefficient, since their links are always on, consuming full power in order to provide low latency, even though the average interconnect utilization is low. To address the above, the Ethernet standards committee in-charge of 40/100/400Gb Ethernet has opted to include protocols that define low power modes, specifically Fast-Wake, alongside the older Deep-Sleep, to make interconnect links energy proportional. With these standards ratified as recently as March 2014, it is unclear how these low power modes can be used in HPC. While energy efficiency is critical, techniques with excessive performance overheads are unlikely to be adopted in HPC. To this end, this paper performs the first detailed analysis of Fast-Wake mode for link energy savings in the context of HPC. Our results show that a combination of Fast-Wake and Deep-Sleep can reduce link energy savings by up to 70% with less than 1% performance overheads. However, we show how the parameters of these low power modes must be carefully configured to obtain the right trade-offs in energy and performance. We believe that our analysis could benefit interconnect vendors looking to use these low power modes for deployment in HPC.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114557941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Pool directory: Efficient coherence tracking with dynamic directory allocation in many-core systems 池目录:在多核系统中使用动态目录分配的高效一致性跟踪

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357165

Sudhanshu Shukla, Mainak Chaudhuri

{"title":"Pool directory: Efficient coherence tracking with dynamic directory allocation in many-core systems","authors":"Sudhanshu Shukla, Mainak Chaudhuri","doi":"10.1109/ICCD.2015.7357165","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357165","url":null,"abstract":"The coherence directory in a chip-multiprocessor keeps track of each memory block inside the cache hierarchy and plays a significant role in offering a scalable shared memory abstraction in many-core systems. Multi-threaded applications typically require two types of directory entries, namely, limited pointer entries tracking a few sharers of a block and bitvector entries tracking larger number of sharers for widely shared blocks. Recent proposals aiming to optimize the average number of bits per directory entry have organized the directory as either a static mix of these two types of entries or a collection of relatively short bitvector entries that can encode either a limited number of sharer pointers or a larger number of sharers hierarchically. In this paper, we present a directory organization that facilitates allocation of two different types of directory entries dynamically. Our design maintains a pool of limited pointer entries, where each entry can also double as a segment directory entry encoding the sharers in a cluster of cores. Each tag in the primary sparse directory array has a pointer that can either represent a sharer or point to an entry in the pool. When multiple segment directory entries are needed to encode all the sharers of a block, our pool management protocol guarantees that all these entries are allocated contiguously so that maintaining a pointer to the head entry is enough. Such a design offers significant flexibility in sharer encoding and allows us to independently size the sparse directory array and the pool. Detailed simulation results show that our proposal incorporated in a 128-core system running multi-threaded applications drawn from scientific, general-purpose, and commercial computing domains can offer, on average, 5% improvement in performance and 20% savings in interconnect traffic compared to the state-of-the-art scalable coherence directory (SCD) proposal when using a 1/16 × sparse directory.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113954494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Sequential C-code to distributed pipelined heterogeneous MPSoC synthesis for streaming applications 顺序c代码分布式流水线异构MPSoC合成流应用程序

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357106

Jude Angelo Ambrose, Yusuke Yachide, Kapil Batra, Jorgen Peddersen, S. Parameswaran

{"title":"Sequential C-code to distributed pipelined heterogeneous MPSoC synthesis for streaming applications","authors":"Jude Angelo Ambrose, Yusuke Yachide, Kapil Batra, Jorgen Peddersen, S. Parameswaran","doi":"10.1109/ICCD.2015.7357106","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357106","url":null,"abstract":"Pipeline of processors allow the execution of a sequential streaming program on multiple processors. However, partitioning sequential code for Multiprocessor Systems-on-Chips (MPSoCs), and then creating the MPSoC platform for the sequential code to execute, is a challenging problem. Parallelizing/pipelining statements within a control loop will improve the throughput of each iteration and the overall performance. Existing techniques, such as OpenMP, for parallelizing control loops is agnostic of the underlying MPSoC architecture, thus limiting the possibilities for further parallelisation. Previous techniques related to distribution of statements to MPSoCs considered homogeneous processors and not automated. In this paper, we propose a novel automated parallelization/ pipelining approach to synthesize a heterogeneous distributed pipelined MPSoC to improve the throughput of a loop (critical for streaming applications). An Integer Linear Programming (ILP)-based formulation to map statements to processor configurations is presented, in order to find the most suitable heterogeneous processor configurations for maximal throughput. Our approach complements state-of-the-art parallelization techniques, such as OpenMP, to further improve the performance of an application. A complete MPSoC platform, for the Tensilica framework, is automatically generated within minutes using our approach for the tested applications.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125404201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Logic simplification by minterm complement for error tolerant application 基于最小项补的容错逻辑简化

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357089

H. Ichihara, Tomoya Inaoka, T. Iwagaki, Tomoo Inoue

引用次数: 6

Immediate sleep: Reducing energy impact of peripheral circuits in STT-MRAM caches 即时睡眠:减少STT-MRAM缓存中外围电路的能量影响

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357096

Eishi Arima, H. Noguchi, Takashi Nakada, Shinobu Miwa, S. Takeda, S. Fujita, Hiroshi Nakamura

引用次数: 9

An orchestrated approach to efficiently manage resources in heterogeneous system architectures 一种在异构系统架构中有效管理资源的编排方法

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357104

C. Bolchini, Gianluca Durelli, A. Miele, G. Pallotta, M. Santambrogio

{"title":"An orchestrated approach to efficiently manage resources in heterogeneous system architectures","authors":"C. Bolchini, Gianluca Durelli, A. Miele, G. Pallotta, M. Santambrogio","doi":"10.1109/ICCD.2015.7357104","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357104","url":null,"abstract":"Nowadays, we are witnessing trends in technology, fabrication processes and computing architectures that lead to the design and development of processing systems constituted by a relevant number of independent, heterogeneous execution resources. The aim is to achieve high-performance while leveraging on other aspects, such as energy consumption. Indeed, heterogeneity comes at the cost of greater design and management complexity. To reach an optimal solution, system architects need to take into account the efficiency of systems' units, i.e., general purpose processors eventually with one or more kinds of accelerators (e.g., GPUs or FPGAs), as well as the workload. This often leads to inefficiency in the exploitation of such resources, and therefore in performance/energy. Within this context, we are proposing a runtime resource manager able to observe the system execution and to dynamically optimise its behaviour with respect to one or more identified functional parameters, according to the architectural characteristics, and the users' and the applications' needs. Such an adaptation characteristic is intrinsically embedded in the device as a software layer, called Orchestrator, able to adapt the runtime resource management according to the target objectives and to the inputs from the external environment.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129774046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Reactive clocks with variability-tracking jitter 带有可变跟踪抖动的响应时钟

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI: 10.1109/ICCD.2015.7357159

J. Cortadella, L. Lavagno, P. López, Marc Lupon, A. Moreno-Conde, Antoni Roca, S. Sapatnekar

{"title":"Reactive clocks with variability-tracking jitter","authors":"J. Cortadella, L. Lavagno, P. López, Marc Lupon, A. Moreno-Conde, Antoni Roca, S. Sapatnekar","doi":"10.1109/ICCD.2015.7357159","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357159","url":null,"abstract":"The growing variability in nanoelectronic devices, due to uncertainties from the manufacturing process and environmental conditions (power supply, temperature, aging), requires increasing design guardbands, forcing circuits to work with conservative clock frequencies. Various schemes for clock generation based on ring oscillators and adaptive clocks have been proposed with the goal to mitigate the power and performance losses attributable to variability. However, there has been no systematic analysis to quantify the benefits of such schemes and no sign-off method has been proposed for timing correctness. This paper presents and analyzes a Reactive Clocking scheme with Variability-Tracking Jitter (RClk) that uses variability as an opportunity to reduce power by continuously adjusting the clock frequency to the varying environmental conditions, and thus, reduces guardband margins significantly. Power can be reduced between 20% and 40% at iso-performance and performance can be boosted by similar amounts at iso-power. Additionally, energy savings can be translated to substantial advantages in terms of reliability and thermal management. More importantly, the technology can be adopted with minimal modifications to conventional EDA flows.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129461640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11