2016 IEEE 34th International Conference on Computer Design (ICCD)最新文献_第6页

Efficient mode changes in multi-mode systems 多模式系统中的有效模式变化

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753345

Akramul Azim, S. Fischmeister

{"title":"Efficient mode changes in multi-mode systems","authors":"Akramul Azim, S. Fischmeister","doi":"10.1109/ICCD.2016.7753345","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753345","url":null,"abstract":"Multi-mode systems work in configurations, but face the challenge of ensuring timing guarantees during mode changes. In a multi-mode system, a mode-change request occurs when the system wants to operate in a new mode, but is already running in one. One mode may include some tasks that are same as that of another mode. Therefore, the new mode may have tasks that are same as the old mode. Changing modes in such a way to skip some already completed tasks can decrease the workload of the new mode. Traditional protocols for changing modes always look forward in time to schedule tasks, although using already completed tasks may avoid re-executing them in the new mode. Reusing common tasks reduces the time to re-execute them while switching modes. In this paper, we introduce the concept and design considerations for a mode-change technique that may use completed tasks stored in checkpoints to avoid unnecessary re-execution and facilitate faster execution of new mode tasks. Through an example case-study, experimental results demonstrate that the overhead of using checkpoints is low, and using rollback facilitates faster execution of new mode tasks if completed tasks stored in checkpoints can be reused.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115806706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

CyHOP: A generic framework for real-time power-performance optimization in networked wearable motion sensors CyHOP:网络可穿戴运动传感器实时功率性能优化的通用框架

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753320

Ramin Fallahzadeh, Hassan Ghasemzadeh

{"title":"CyHOP: A generic framework for real-time power-performance optimization in networked wearable motion sensors","authors":"Ramin Fallahzadeh, Hassan Ghasemzadeh","doi":"10.1109/ICCD.2016.7753320","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753320","url":null,"abstract":"Power consumption is a major obstacle in designing stringent resource constraint wearables. Several system-level design considerations contribute to energy consumption of these systems which must be taken into account while designing the system. We propose a power-performance optimization framework, namely CyHOP (Cyclic and Holistic Optimization framework), for connected wearable motion sensors. While existing work focus solely on one design parameter, our approach globally trades-off the performance of activity recognition and power consumption. CyHOP is capable of optimally adjusting the system to fulfill specific application needs. Using a smoothing technique, the initial multi-variate non-convex optimization problem is reduced to a convex problem and solved using our devised derivative-free optimization approach, namely, cyclic coordinate search. Our model performs a linear search by cycling through the system variables on each iteration until it converges to the global optimum. Using real-world data collected with wearable motion sensors during activity monitoring, we validate our approached with various performance thresholds ranging from 40% to 80%.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133008025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

AIBA: An Automated Intra-cycle Behavioral Analysis for SystemC-based design exploration AIBA:用于基于系统c的设计探索的自动化周期内行为分析

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753303

Mehran Goli, Jannis Stoppe, R. Drechsler

引用次数: 18

Quantifying the difference in resource demand among classic and modern NoC workloads 量化经典和现代NoC工作负载之间的资源需求差异

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753314

Amirhossein Mirhosseini, Mohammad Sadrosadati, Maryam Zare, H. Sarbazi-Azad

引用次数: 7

Understanding and alleviating intra-die and intra-DIMM parameter variation in the memory system 了解和减轻内存系统中芯片内部和dimm内部参数的变化

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753283

Meysam Taassori, Ali Shafiee, R. Balasubramonian

{"title":"Understanding and alleviating intra-die and intra-DIMM parameter variation in the memory system","authors":"Meysam Taassori, Ali Shafiee, R. Balasubramonian","doi":"10.1109/ICCD.2016.7753283","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753283","url":null,"abstract":"Continued process scaling must overcome several manufacturing challenges. At the same time, industry is exploring many new memory technologies that require new manufacturing processes. In such challenging fabrication regimes, parameter variation (PV) and yield will be important problems. While many recent bodies of work have targeted PV in processors, few have targeted PV in the memory system. Mitigation techniques have either focused on refresh, or have focused on inter-die variation. In this work, with empirical measurements, we first show that PV and specifically intra-die PV is indeed a real phenomenon in modern DRAM chips. We show that this intra-die PV can impact timing parameters for different banks within a DRAM chip. In response to growing PV, memory timing parameters will likely be set very conservatively to accommodate the worst case. To overcome these worst-case limitations, we propose the design of a reconfigurable memory module that detects PV in the field and organizes the memory system into fast/slow regions. This requires changes to the memory controller and to buffer chips on DIMMs. Further, OS migration policies can move frequently accessed pages to the fast regions. This overall approach not only improves performance and energy, it also provides a configurable platform for systems that can tolerate errors or approximation. The proposed system yields an average performance improvement of 12.6% in DRAM systems, and 25.5% in NVM systems.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130403498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Speculative path power estimation using trace-driven simulations during high-level design phase 在高级设计阶段使用跟踪驱动模拟的推测路径功率估计

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753350

Saumya Chandra, R. Jayaseelan, Ravi Bhargava

{"title":"Speculative path power estimation using trace-driven simulations during high-level design phase","authors":"Saumya Chandra, R. Jayaseelan, Ravi Bhargava","doi":"10.1109/ICCD.2016.7753350","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753350","url":null,"abstract":"Today power is an important design metric and the ongoing goal of microprocessor designers is to maximize performance within specified power targets. The key to achieving this goal is the ability to accurately estimate power and performance design points of future products during the high-level micro-architectural design phase (HLD). These estimates are heavily used for feature analysis and product feasibility studies. Most performance and power simulators across the industry use the trace-driven simulation model (TDM) as opposed to an execution driven model (EDM). This is because, in general, trace-driven models: (i) have faster turnaround time; (ii) require significantly lower resources in terms of disk space, CPU time and memory footprint; and (iii) are more robust, portable and well understood. However, TDM simulations lack the ability to accurately capture the flow of speculative path (or wrong path) 1 execution following a branch mispredict in an out-of-order processor pipeline. This leads to inaccuracies in power and performance estimates. On the other hand, in the EDM method, input is an executable and the model can fetch and execute instructions down the speculative path on a branch mispredict. As such it enables us to accurately account for the impact of the speculative path activity. However, it is slower, prone to failures, and has much higher development and validation effort. In this paper we compare and analyze performance and power estimates from TDM and EDM simulations for the same workload phases. We observe that the impact of wrong path on power estimates is significantly higher than on the performance estimates. Using results from our analysis, we develop a methodology to account for power consumption during wrong path execution in TDM simulations. We show that the proposed methodology can provide power estimates approaching EDM-based accuracy while not sacrificing the speed and flexibility of the trace-driven models.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130033364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Refresh-aware loop scheduling for high performance low power volatile STT-RAM 高性能低功耗易失性STT-RAM的刷新感知循环调度

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753282

Keni Qiu, Junpeng Luo, Zhiyao Gong, Wei-gong Zhang, Jing Wang, Yuanchao Xu, Tao Li, C. Xue

{"title":"Refresh-aware loop scheduling for high performance low power volatile STT-RAM","authors":"Keni Qiu, Junpeng Luo, Zhiyao Gong, Wei-gong Zhang, Jing Wang, Yuanchao Xu, Tao Li, C. Xue","doi":"10.1109/ICCD.2016.7753282","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753282","url":null,"abstract":"The highlighted advantages of low leakage power, high storage density and immunity to electronic magnetic radiation make STT-RAM a promising candidate to build cache, SPM or main memory in embedded systems. However, write operations on STT-RAM have considerably longer latency and higher energy consumption than conventional SRAM. To solve this problem, researchers have proposed to relax STT-RAM's non-volatility and to have it work in a fast and low power mode. Under this volatile mode, refresh operations are needed to guarantee data correctness if their lifespan is larger than the retention time. It is observed that this refresh overhead is significant for data in stencil loops with the characteristic of constant read and write dependencies. This paper proposes a loop scheduling technique which can traverse loops in a new direction such that data lifespan can be greatly shortened. Therefore, overall refresh overhead can be efficiently mitigated so as to improve performance and reduce power consumption. The experimental results indicate that access latency and dynamic energy can be improved by 21.4~96.0% and 22.0~95.5% respectively by the proposed scheduling scheme.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123933871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Process variations-aware resistive associative processor design 过程变化感知电阻关联处理器设计

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753260

Hasan Erdem Yantır, M. Fouda, A. Eltawil, F. Kurdahi

引用次数: 6

Energy aware routing of multi-level Network-on-Chip traffic 多级片上网络流量的能量感知路由

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753330

Vasil Pano, I. Yilmaz, A. More, B. Taskin

{"title":"Energy aware routing of multi-level Network-on-Chip traffic","authors":"Vasil Pano, I. Yilmaz, A. More, B. Taskin","doi":"10.1109/ICCD.2016.7753330","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753330","url":null,"abstract":"The emergence of Network-on-Chip (NoC) as a communication paradigm for Multi-Processor System-on-Chips (MPSoCs) significantly exacerbates the need to provide a methodology that optimizes the energy consumption of the overall system. This is especially important when factoring in current Network-on-Chip advances which have multiple communication media such as on-chip wireless or nano-photonics links, hybrid with traditional wired links. All of these media have different energy profiles, and if not taken into consideration the system will incur a higher power consumption throughout the runtime of the application. In this work, the case for EDP (energy-delay product) optimization between different levels of a multi-level Network-on-Chip is presented. Using a dynamic, energy aware algorithm, the EDP improvement is compared to a multi-level Network-on-Chip using a statically optimized routing. The proposed routing algorithm handles the different types of energy-delay profiles of multiple links. The end product is a methodology that lowers the overall energy consumption by optimizing the energy profile of the Network-on-Chip while also minimizing the network delay.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115029128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ONAC: Optimal number of active cores detector for energy efficient GPU computing ONAC:用于高效GPU计算的最优活动内核检测器数量

2016 IEEE 34th International Conference on Computer Design (ICCD) Pub Date : 2016-10-01 DOI: 10.1109/ICCD.2016.7753335

Xian Zhu, Mihir Awatramani, D. Rover, Joseph Zambreno

{"title":"ONAC: Optimal number of active cores detector for energy efficient GPU computing","authors":"Xian Zhu, Mihir Awatramani, D. Rover, Joseph Zambreno","doi":"10.1109/ICCD.2016.7753335","DOIUrl":"https://doi.org/10.1109/ICCD.2016.7753335","url":null,"abstract":"Graphics Processing Units (GPUs) have become a prevalent platform for high throughput general purpose computing. The peak computational throughput of GPUs has been steadily increasing with each technology node by scaling the number of cores on the chip. Although this vastly improves the performance of several compute-intensive applications, our experiments show that some applications can achieve peak performance without utilizing all cores on the chip. We refer to the number of cores at which performance of an application saturates as the optimal number of active cores (Nopt). We propose executing the application on Nopt cores, and power-gating the unused cores to reduce static power consumption. Towards this target, we present ONAC (Optimal Number of Active Cores detector), a runtime technique to detect Nopt. ONAC uses a novel estimation model, which significantly reduces the number of hardware samples taken to detect the optimal core count, compared to a sequential detection technique (Seq-Det). We implement ONAC and Seq-Det in a cycle-level GPU performance simulator and analyze their effect on performance, power and energy. Our evaluation shows that ONAC and Seq-Det can reduce energy consumption by 20% and 10% on average for memory-intensive applications, without sacrificing more than 2% performance. The higher energy savings for ONAC comes from reducing the detection time by 45% as compared to Seq-Det.","PeriodicalId":297899,"journal":{"name":"2016 IEEE 34th International Conference on Computer Design (ICCD)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114386236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4