2016 IEEE 34th International Conference on Computer Design (ICCD)最新文献

筛选
英文 中文
MFAP: Fair Allocation between fully backlogged and non-fully backlogged applications MFAP:完全积压和非完全积压应用程序之间的公平分配
2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753343
Yan Sui, Chun Yang, Dong Tong, Xianhua Liu, Xu Cheng
{"title":"MFAP: Fair Allocation between fully backlogged and non-fully backlogged applications","authors":"Yan Sui, Chun Yang, Dong Tong, Xianhua Liu, Xu Cheng","doi":"10.1109/ICCD.2016.7753343","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753343","url":null,"abstract":"In this paper, we consider the problem of ensuring fairness in systems serving a mixture of fully backlogged applications, which continuously demand resources, and non-fully backlogged applications. We introduce a fairness metric, called interference fairness, the basic idea underlying which is that the interference caused by application A for another application B should be equal to that caused by B for A. To effectively and efficiently guarantee this fairness metric, we propose Mutual Fair Allocation Policy (MFAP), a simple and powerful resource sharing policy, and show how it guarantees interference fairness between any pair of applications. We also show that MFAP, unlike other viable policies, satisfies several highly desirable properties, including some from game theory, as well as common sense intuitions. As a use case, we implemented MFAP on a disk scheduling framework. The experimental results based on synthetic and real workloads show how our implementation achieved interference fairness and improved non-fully backlogged applications performance.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126607219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AIBA: An Automated Intra-cycle Behavioral Analysis for SystemC-based design exploration AIBA:用于基于系统c的设计探索的自动化周期内行为分析
2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753303
Mehran Goli, Jannis Stoppe, R. Drechsler
{"title":"AIBA: An Automated Intra-cycle Behavioral Analysis for SystemC-based design exploration","authors":"Mehran Goli, Jannis Stoppe, R. Drechsler","doi":"10.1109/ICCD.2016.7753303","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753303","url":null,"abstract":"In order to overcome the ever increasing complexity of digital circuits, system design at the Electronic System Level (ESL) has become an area of active research. SystemC provides designers with a readily-available ESL framework, allowing them to design mixed hardware/software systems using a standardized C++ library. The analysis of the resulting designs is crucial to e.g. apply additional validation steps or assist designers during the development process. Existing approaches focus on the extraction of static information, providing designers with models that describe the structure of their system but not its behavior. In this paper, we introduce the Automated Intra-cycle Behavioral Analysis tool, AIBA. AIBA utilizes the GNU debugger to execute a two-step analysis that retrieves behavioral and architectural information of ESL designs. The proposed method is completely non-intrusive, allowing both SystemC designs and the standard tool flow to be used without any modification. Case studies confirm the benefits of the approach.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122007863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Quantifying the difference in resource demand among classic and modern NoC workloads 量化经典和现代NoC工作负载之间的资源需求差异
2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753314
Amirhossein Mirhosseini, Mohammad Sadrosadati, Maryam Zare, H. Sarbazi-Azad
{"title":"Quantifying the difference in resource demand among classic and modern NoC workloads","authors":"Amirhossein Mirhosseini, Mohammad Sadrosadati, Maryam Zare, H. Sarbazi-Azad","doi":"10.1109/ICCD.2016.7753314","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753314","url":null,"abstract":"This paper quantifies the difference in resource demand between modern and classic NoC workloads. In the paper, we show that modern workloads are able to better utilize higher numbers of VCs and smaller C factors in order to attain performance and energy efficiency. This is because of the high throughput and possible local congestions in their traffic pattern. As a result, such workloads are more suitable for concurrency and redundancy energy reduction techniques where the voltage and frequency are reduced simultaneously and the increased power budget is used for introducing additional resources to the network in order to improve the performance.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122042643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Understanding and alleviating intra-die and intra-DIMM parameter variation in the memory system 了解和减轻内存系统中芯片内部和dimm内部参数的变化
2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753283
Meysam Taassori, Ali Shafiee, R. Balasubramonian
{"title":"Understanding and alleviating intra-die and intra-DIMM parameter variation in the memory system","authors":"Meysam Taassori, Ali Shafiee, R. Balasubramonian","doi":"10.1109/ICCD.2016.7753283","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753283","url":null,"abstract":"Continued process scaling must overcome several manufacturing challenges. At the same time, industry is exploring many new memory technologies that require new manufacturing processes. In such challenging fabrication regimes, parameter variation (PV) and yield will be important problems. While many recent bodies of work have targeted PV in processors, few have targeted PV in the memory system. Mitigation techniques have either focused on refresh, or have focused on inter-die variation. In this work, with empirical measurements, we first show that PV and specifically intra-die PV is indeed a real phenomenon in modern DRAM chips. We show that this intra-die PV can impact timing parameters for different banks within a DRAM chip. In response to growing PV, memory timing parameters will likely be set very conservatively to accommodate the worst case. To overcome these worst-case limitations, we propose the design of a reconfigurable memory module that detects PV in the field and organizes the memory system into fast/slow regions. This requires changes to the memory controller and to buffer chips on DIMMs. Further, OS migration policies can move frequently accessed pages to the fast regions. This overall approach not only improves performance and energy, it also provides a configurable platform for systems that can tolerate errors or approximation. The proposed system yields an average performance improvement of 12.6% in DRAM systems, and 25.5% in NVM systems.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130403498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Speculative path power estimation using trace-driven simulations during high-level design phase 在高级设计阶段使用跟踪驱动模拟的推测路径功率估计
2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753350
Saumya Chandra, R. Jayaseelan, Ravi Bhargava
{"title":"Speculative path power estimation using trace-driven simulations during high-level design phase","authors":"Saumya Chandra, R. Jayaseelan, Ravi Bhargava","doi":"10.1109/ICCD.2016.7753350","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753350","url":null,"abstract":"Today power is an important design metric and the ongoing goal of microprocessor designers is to maximize performance within specified power targets. The key to achieving this goal is the ability to accurately estimate power and performance design points of future products during the high-level micro-architectural design phase (HLD). These estimates are heavily used for feature analysis and product feasibility studies. Most performance and power simulators across the industry use the trace-driven simulation model (TDM) as opposed to an execution driven model (EDM). This is because, in general, trace-driven models: (i) have faster turnaround time; (ii) require significantly lower resources in terms of disk space, CPU time and memory footprint; and (iii) are more robust, portable and well understood. However, TDM simulations lack the ability to accurately capture the flow of speculative path (or wrong path) 1 execution following a branch mispredict in an out-of-order processor pipeline. This leads to inaccuracies in power and performance estimates. On the other hand, in the EDM method, input is an executable and the model can fetch and execute instructions down the speculative path on a branch mispredict. As such it enables us to accurately account for the impact of the speculative path activity. However, it is slower, prone to failures, and has much higher development and validation effort. In this paper we compare and analyze performance and power estimates from TDM and EDM simulations for the same workload phases. We observe that the impact of wrong path on power estimates is significantly higher than on the performance estimates. Using results from our analysis, we develop a methodology to account for power consumption during wrong path execution in TDM simulations. We show that the proposed methodology can provide power estimates approaching EDM-based accuracy while not sacrificing the speed and flexibility of the trace-driven models.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130033364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Refresh-aware loop scheduling for high performance low power volatile STT-RAM 高性能低功耗易失性STT-RAM的刷新感知循环调度
2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753282
Keni Qiu, Junpeng Luo, Zhiyao Gong, Wei-gong Zhang, Jing Wang, Yuanchao Xu, Tao Li, C. Xue
{"title":"Refresh-aware loop scheduling for high performance low power volatile STT-RAM","authors":"Keni Qiu, Junpeng Luo, Zhiyao Gong, Wei-gong Zhang, Jing Wang, Yuanchao Xu, Tao Li, C. Xue","doi":"10.1109/ICCD.2016.7753282","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753282","url":null,"abstract":"The highlighted advantages of low leakage power, high storage density and immunity to electronic magnetic radiation make STT-RAM a promising candidate to build cache, SPM or main memory in embedded systems. However, write operations on STT-RAM have considerably longer latency and higher energy consumption than conventional SRAM. To solve this problem, researchers have proposed to relax STT-RAM's non-volatility and to have it work in a fast and low power mode. Under this volatile mode, refresh operations are needed to guarantee data correctness if their lifespan is larger than the retention time. It is observed that this refresh overhead is significant for data in stencil loops with the characteristic of constant read and write dependencies. This paper proposes a loop scheduling technique which can traverse loops in a new direction such that data lifespan can be greatly shortened. Therefore, overall refresh overhead can be efficiently mitigated so as to improve performance and reduce power consumption. The experimental results indicate that access latency and dynamic energy can be improved by 21.4~96.0% and 22.0~95.5% respectively by the proposed scheduling scheme.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123933871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Process variations-aware resistive associative processor design 过程变化感知电阻关联处理器设计
2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753260
Hasan Erdem Yantır, M. Fouda, A. Eltawil, F. Kurdahi
{"title":"Process variations-aware resistive associative processor design","authors":"Hasan Erdem Yantır, M. Fouda, A. Eltawil, F. Kurdahi","doi":"10.1109/ICCD.2016.7753260","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753260","url":null,"abstract":"Recent breakthroughs in memristive devices have demonstrated the potential of using resistive content addressable memories for associative processing. These architectures enable ultra-high density integrated circuits along with low-power computation. However, the reliability of memristive elements is limiting the widespread adoption of these architectures. In this study, we address the reliability issues that arise in high density, resistive associative processor architectures. We propose methods to design process variation immune resistive content addressable memories and minimize the error probabilities. According to SPICE-based circuit simulations, the reliability of the circuit increases significantly and thus positively influences the accuracy of arithmetic operations as well.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124269635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Energy aware routing of multi-level Network-on-Chip traffic 多级片上网络流量的能量感知路由
2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753330
Vasil Pano, I. Yilmaz, A. More, B. Taskin
{"title":"Energy aware routing of multi-level Network-on-Chip traffic","authors":"Vasil Pano, I. Yilmaz, A. More, B. Taskin","doi":"10.1109/ICCD.2016.7753330","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753330","url":null,"abstract":"The emergence of Network-on-Chip (NoC) as a communication paradigm for Multi-Processor System-on-Chips (MPSoCs) significantly exacerbates the need to provide a methodology that optimizes the energy consumption of the overall system. This is especially important when factoring in current Network-on-Chip advances which have multiple communication media such as on-chip wireless or nano-photonics links, hybrid with traditional wired links. All of these media have different energy profiles, and if not taken into consideration the system will incur a higher power consumption throughout the runtime of the application. In this work, the case for EDP (energy-delay product) optimization between different levels of a multi-level Network-on-Chip is presented. Using a dynamic, energy aware algorithm, the EDP improvement is compared to a multi-level Network-on-Chip using a statically optimized routing. The proposed routing algorithm handles the different types of energy-delay profiles of multiple links. The end product is a methodology that lowers the overall energy consumption by optimizing the energy profile of the Network-on-Chip while also minimizing the network delay.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115029128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ONAC: Optimal number of active cores detector for energy efficient GPU computing ONAC:用于高效GPU计算的最优活动内核检测器数量
2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753335
Xian Zhu, Mihir Awatramani, D. Rover, Joseph Zambreno
{"title":"ONAC: Optimal number of active cores detector for energy efficient GPU computing","authors":"Xian Zhu, Mihir Awatramani, D. Rover, Joseph Zambreno","doi":"10.1109/ICCD.2016.7753335","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753335","url":null,"abstract":"Graphics Processing Units (GPUs) have become a prevalent platform for high throughput general purpose computing. The peak computational throughput of GPUs has been steadily increasing with each technology node by scaling the number of cores on the chip. Although this vastly improves the performance of several compute-intensive applications, our experiments show that some applications can achieve peak performance without utilizing all cores on the chip. We refer to the number of cores at which performance of an application saturates as the optimal number of active cores (Nopt). We propose executing the application on Nopt cores, and power-gating the unused cores to reduce static power consumption. Towards this target, we present ONAC (Optimal Number of Active Cores detector), a runtime technique to detect Nopt. ONAC uses a novel estimation model, which significantly reduces the number of hardware samples taken to detect the optimal core count, compared to a sequential detection technique (Seq-Det). We implement ONAC and Seq-Det in a cycle-level GPU performance simulator and analyze their effect on performance, power and energy. Our evaluation shows that ONAC and Seq-Det can reduce energy consumption by 20% and 10% on average for memory-intensive applications, without sacrificing more than 2% performance. The higher energy savings for ONAC comes from reducing the detection time by 45% as compared to Seq-Det.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114386236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Unveiling difficult bugs in address translation caching arrays for effective post-silicon validation 揭示地址转换缓存数组中的困难bug,以实现有效的后硅验证
2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753339
G. Papadimitriou, D. Gizopoulos, Athanasios Chatzidimitriou, Tom Kolan, A. Koyfman, Ronny Morad, V. Sokhin
{"title":"Unveiling difficult bugs in address translation caching arrays for effective post-silicon validation","authors":"G. Papadimitriou, D. Gizopoulos, Athanasios Chatzidimitriou, Tom Kolan, A. Koyfman, Ronny Morad, V. Sokhin","doi":"10.1109/ICCD.2016.7753339","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753339","url":null,"abstract":"Post-silicon validation is one of the most important parts of the microprocessor prototype chip lifecycle. It is the last chance for debug engineers to detect defects and bugs that escaped pre-silicon verification, before the chip is released to the market. Effective solutions are required to harness the peak performance of the hardware prototype and evaluate whether the microprocessor chip is fully compliant with the instruction set and other specifications. We perform a comprehensive experimental study on a state-of-the-art microarchitecture to assess and identify the most difficult bugs in address translation caching arrays (multi-level TLBs and MMU Caches), and explain why these bugs persist across generations. We also categorize them into distinct bug scenarios. We then propose a novel methodology for generating random self-checking stimuli programs, which expose and detect such bug scenarios. Our experimental results show that the proposed method can detect difficult bugs that are likely to be missed by traditional post-silicon validation techniques.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133777274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信