2016 IEEE 34th International Conference on Computer Design (ICCD)最新文献_第8页

MFAP: Fair Allocation between fully backlogged and non-fully backlogged applications MFAP:完全积压和非完全积压应用程序之间的公平分配

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753343

Yan Sui, Chun Yang, Dong Tong, Xianhua Liu, Xu Cheng

引用次数: 0

AIBA: An Automated Intra-cycle Behavioral Analysis for SystemC-based design exploration AIBA:用于基于系统c的设计探索的自动化周期内行为分析

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753303

Mehran Goli, Jannis Stoppe, R. Drechsler

引用次数: 18

Quantifying the difference in resource demand among classic and modern NoC workloads 量化经典和现代NoC工作负载之间的资源需求差异

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753314

Amirhossein Mirhosseini, Mohammad Sadrosadati, Maryam Zare, H. Sarbazi-Azad

引用次数: 7

Understanding and alleviating intra-die and intra-DIMM parameter variation in the memory system 了解和减轻内存系统中芯片内部和dimm内部参数的变化

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753283

Meysam Taassori, Ali Shafiee, R. Balasubramonian

{"title":"Understanding and alleviating intra-die and intra-DIMM parameter variation in the memory system","authors":"Meysam Taassori, Ali Shafiee, R. Balasubramonian","doi":"10.1109/ICCD.2016.7753283","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753283","url":null,"abstract":"Continued process scaling must overcome several manufacturing challenges. At the same time, industry is exploring many new memory technologies that require new manufacturing processes. In such challenging fabrication regimes, parameter variation (PV) and yield will be important problems. While many recent bodies of work have targeted PV in processors, few have targeted PV in the memory system. Mitigation techniques have either focused on refresh, or have focused on inter-die variation. In this work, with empirical measurements, we first show that PV and specifically intra-die PV is indeed a real phenomenon in modern DRAM chips. We show that this intra-die PV can impact timing parameters for different banks within a DRAM chip. In response to growing PV, memory timing parameters will likely be set very conservatively to accommodate the worst case. To overcome these worst-case limitations, we propose the design of a reconfigurable memory module that detects PV in the field and organizes the memory system into fast/slow regions. This requires changes to the memory controller and to buffer chips on DIMMs. Further, OS migration policies can move frequently accessed pages to the fast regions. This overall approach not only improves performance and energy, it also provides a configurable platform for systems that can tolerate errors or approximation. The proposed system yields an average performance improvement of 12.6% in DRAM systems, and 25.5% in NVM systems.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130403498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Speculative path power estimation using trace-driven simulations during high-level design phase 在高级设计阶段使用跟踪驱动模拟的推测路径功率估计

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753350

Saumya Chandra, R. Jayaseelan, Ravi Bhargava

{"title":"Speculative path power estimation using trace-driven simulations during high-level design phase","authors":"Saumya Chandra, R. Jayaseelan, Ravi Bhargava","doi":"10.1109/ICCD.2016.7753350","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753350","url":null,"abstract":"Today power is an important design metric and the ongoing goal of microprocessor designers is to maximize performance within specified power targets. The key to achieving this goal is the ability to accurately estimate power and performance design points of future products during the high-level micro-architectural design phase (HLD). These estimates are heavily used for feature analysis and product feasibility studies. Most performance and power simulators across the industry use the trace-driven simulation model (TDM) as opposed to an execution driven model (EDM). This is because, in general, trace-driven models: (i) have faster turnaround time; (ii) require significantly lower resources in terms of disk space, CPU time and memory footprint; and (iii) are more robust, portable and well understood. However, TDM simulations lack the ability to accurately capture the flow of speculative path (or wrong path) 1 execution following a branch mispredict in an out-of-order processor pipeline. This leads to inaccuracies in power and performance estimates. On the other hand, in the EDM method, input is an executable and the model can fetch and execute instructions down the speculative path on a branch mispredict. As such it enables us to accurately account for the impact of the speculative path activity. However, it is slower, prone to failures, and has much higher development and validation effort. In this paper we compare and analyze performance and power estimates from TDM and EDM simulations for the same workload phases. We observe that the impact of wrong path on power estimates is significantly higher than on the performance estimates. Using results from our analysis, we develop a methodology to account for power consumption during wrong path execution in TDM simulations. We show that the proposed methodology can provide power estimates approaching EDM-based accuracy while not sacrificing the speed and flexibility of the trace-driven models.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130033364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Refresh-aware loop scheduling for high performance low power volatile STT-RAM 高性能低功耗易失性STT-RAM的刷新感知循环调度

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753282

Keni Qiu, Junpeng Luo, Zhiyao Gong, Wei-gong Zhang, Jing Wang, Yuanchao Xu, Tao Li, C. Xue

{"title":"Refresh-aware loop scheduling for high performance low power volatile STT-RAM","authors":"Keni Qiu, Junpeng Luo, Zhiyao Gong, Wei-gong Zhang, Jing Wang, Yuanchao Xu, Tao Li, C. Xue","doi":"10.1109/ICCD.2016.7753282","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753282","url":null,"abstract":"The highlighted advantages of low leakage power, high storage density and immunity to electronic magnetic radiation make STT-RAM a promising candidate to build cache, SPM or main memory in embedded systems. However, write operations on STT-RAM have considerably longer latency and higher energy consumption than conventional SRAM. To solve this problem, researchers have proposed to relax STT-RAM's non-volatility and to have it work in a fast and low power mode. Under this volatile mode, refresh operations are needed to guarantee data correctness if their lifespan is larger than the retention time. It is observed that this refresh overhead is significant for data in stencil loops with the characteristic of constant read and write dependencies. This paper proposes a loop scheduling technique which can traverse loops in a new direction such that data lifespan can be greatly shortened. Therefore, overall refresh overhead can be efficiently mitigated so as to improve performance and reduce power consumption. The experimental results indicate that access latency and dynamic energy can be improved by 21.4~96.0% and 22.0~95.5% respectively by the proposed scheduling scheme.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123933871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Process variations-aware resistive associative processor design 过程变化感知电阻关联处理器设计

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753260

Hasan Erdem Yantır, M. Fouda, A. Eltawil, F. Kurdahi

引用次数: 6

Energy aware routing of multi-level Network-on-Chip traffic 多级片上网络流量的能量感知路由

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753330

Vasil Pano, I. Yilmaz, A. More, B. Taskin

{"title":"Energy aware routing of multi-level Network-on-Chip traffic","authors":"Vasil Pano, I. Yilmaz, A. More, B. Taskin","doi":"10.1109/ICCD.2016.7753330","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753330","url":null,"abstract":"The emergence of Network-on-Chip (NoC) as a communication paradigm for Multi-Processor System-on-Chips (MPSoCs) significantly exacerbates the need to provide a methodology that optimizes the energy consumption of the overall system. This is especially important when factoring in current Network-on-Chip advances which have multiple communication media such as on-chip wireless or nano-photonics links, hybrid with traditional wired links. All of these media have different energy profiles, and if not taken into consideration the system will incur a higher power consumption throughout the runtime of the application. In this work, the case for EDP (energy-delay product) optimization between different levels of a multi-level Network-on-Chip is presented. Using a dynamic, energy aware algorithm, the EDP improvement is compared to a multi-level Network-on-Chip using a statically optimized routing. The proposed routing algorithm handles the different types of energy-delay profiles of multiple links. The end product is a methodology that lowers the overall energy consumption by optimizing the energy profile of the Network-on-Chip while also minimizing the network delay.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115029128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ONAC: Optimal number of active cores detector for energy efficient GPU computing ONAC:用于高效GPU计算的最优活动内核检测器数量

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753335

Xian Zhu, Mihir Awatramani, D. Rover, Joseph Zambreno

{"title":"ONAC: Optimal number of active cores detector for energy efficient GPU computing","authors":"Xian Zhu, Mihir Awatramani, D. Rover, Joseph Zambreno","doi":"10.1109/ICCD.2016.7753335","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753335","url":null,"abstract":"Graphics Processing Units (GPUs) have become a prevalent platform for high throughput general purpose computing. The peak computational throughput of GPUs has been steadily increasing with each technology node by scaling the number of cores on the chip. Although this vastly improves the performance of several compute-intensive applications, our experiments show that some applications can achieve peak performance without utilizing all cores on the chip. We refer to the number of cores at which performance of an application saturates as the optimal number of active cores (Nopt). We propose executing the application on Nopt cores, and power-gating the unused cores to reduce static power consumption. Towards this target, we present ONAC (Optimal Number of Active Cores detector), a runtime technique to detect Nopt. ONAC uses a novel estimation model, which significantly reduces the number of hardware samples taken to detect the optimal core count, compared to a sequential detection technique (Seq-Det). We implement ONAC and Seq-Det in a cycle-level GPU performance simulator and analyze their effect on performance, power and energy. Our evaluation shows that ONAC and Seq-Det can reduce energy consumption by 20% and 10% on average for memory-intensive applications, without sacrificing more than 2% performance. The higher energy savings for ONAC comes from reducing the detection time by 45% as compared to Seq-Det.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114386236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Unveiling difficult bugs in address translation caching arrays for effective post-silicon validation 揭示地址转换缓存数组中的困难bug，以实现有效的后硅验证

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753339

G. Papadimitriou, D. Gizopoulos, Athanasios Chatzidimitriou, Tom Kolan, A. Koyfman, Ronny Morad, V. Sokhin

{"title":"Unveiling difficult bugs in address translation caching arrays for effective post-silicon validation","authors":"G. Papadimitriou, D. Gizopoulos, Athanasios Chatzidimitriou, Tom Kolan, A. Koyfman, Ronny Morad, V. Sokhin","doi":"10.1109/ICCD.2016.7753339","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753339","url":null,"abstract":"Post-silicon validation is one of the most important parts of the microprocessor prototype chip lifecycle. It is the last chance for debug engineers to detect defects and bugs that escaped pre-silicon verification, before the chip is released to the market. Effective solutions are required to harness the peak performance of the hardware prototype and evaluate whether the microprocessor chip is fully compliant with the instruction set and other specifications. We perform a comprehensive experimental study on a state-of-the-art microarchitecture to assess and identify the most difficult bugs in address translation caching arrays (multi-level TLBs and MMU Caches), and explain why these bugs persist across generations. We also categorize them into distinct bug scenarios. We then propose a novel methodology for generating random self-checking stimuli programs, which expose and detect such bug scenarios. Our experimental results show that the proposed method can detect difficult bugs that are likely to be missed by traditional post-silicon validation techniques.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133777274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3