ACM Trans. Design Autom. Electr. Syst.最新文献_第9页

Scheduling Globally Asynchronous Locally Synchronous Programs for Guaranteed Response Times 调度全局异步本地同步程序保证响应时间

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2740961

Heejong Park, Avinash Malik, Z. Salcic

引用次数: 4

Design of Ultra-Low Power Scalable-Throughput Many-Core DSP Applications 超低功耗可扩展吞吐量多核DSP应用的设计

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2720018

Meeta Srivastav, M. Ehteshamuddin, K. Stegner, L. Nazhandali

{"title":"Design of Ultra-Low Power Scalable-Throughput Many-Core DSP Applications","authors":"Meeta Srivastav, M. Ehteshamuddin, K. Stegner, L. Nazhandali","doi":"10.1145/2720018","DOIUrl":"https://doi.org/10.1145/2720018","url":null,"abstract":"We propose a system-level solution in designing process variation aware (PVA) scalable-throughput many-core systems for energy constrained applications. In our proposed methodology, we leverage the benefits of voltage scaling for obtaining energy efficiency while compensating for the loss in throughput by exploiting parallelism present in various DSP designs. We demonstrate that such a hybrid method consumes 6.27%- 28.15% less power as compared to simple dynamic voltage scaling over different workload environments. Design details of a prototype chip fabricated on 90nm technology node and its findings are presented. Chip consists of 8 homogeneous FIR cores, which are capable of running from near-threshold to nominal voltages. In our 20 chip population, we observe 7% variation in speed among the cores at nominal voltage (0.9V) and 26% at near threshold voltage (0.55V). We also observe 54% variation in power consumption of the cores. For any desired throughput, the optimum number of cores and their optimum operating voltage is chosen based on the speed and power characteristics of the cores present inside the chip. We will also present analysis on energy-efficiency of such systems based on changes in ambient temperature.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"8 1","pages":"34:1-34:21"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84156127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

System-Level Observation Framework for Non-Intrusive Runtime Monitoring of Embedded Systems 嵌入式系统非侵入式运行时监控的系统级观察框架

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2717310

Jong Chul Lee

引用次数: 11

High-Throughput Logic Timing Simulation on GPGPUs 基于gpgpu的高吞吐量逻辑时序仿真

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2714564

S. Holst, M. Imhof, H. Wunderlich

{"title":"High-Throughput Logic Timing Simulation on GPGPUs","authors":"S. Holst, M. Imhof, H. Wunderlich","doi":"10.1145/2714564","DOIUrl":"https://doi.org/10.1145/2714564","url":null,"abstract":"Many EDA tasks such as test set characterization or the precise estimation of power consumption, power droop and temperature development, require a very large number of time-aware gate-level logic simulations. Until now, such characterizations have been feasible only for rather small designs or with reduced precision due to the high computational demands.\u0000 The new simulation system presented here is able to accelerate such tasks by more than two orders of magnitude and provides for the first time fast and comprehensive timing simulations for industrial-sized designs. Hazards, pulse-filtering, and pin-to-pin delay are supported for the first time in a GPGPU accelerated simulator, and the system can easily be extended to even more realistic delay models and further applications.\u0000 A sophisticated mapping with efficient memory utilization and access patterns as well as minimal synchronizations and control flow divergence is able to use the full potential of GPGPU architectures. To provide such a mapping, we combine for the first time the versatility of event-based timing simulation and multi-dimensional parallelism used in GPU-based gate-level simulators. The result is a throughput-optimized timing simulation algorithm, which runs many simulation instances in parallel and at the same time fully exploits gate-parallelism within the circuit.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"15 1","pages":"37:1-37:22"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75603542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Least Upper Delay Bound for VBR Flows in Networks-on-Chip with Virtual Channels 带虚拟信道的片上网络中VBR流的最小上延迟界

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2733374

Fahimeh Jafari, Zhonghai Lu, A. Jantsch

引用次数: 7

Explaining Software Failures by Cascade Fault Localization 通过级联故障定位解释软件故障

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2738038

Qiuping Yi, Z. Yang, Jian Liu, Chen Zhao, Chao Wang

{"title":"Explaining Software Failures by Cascade Fault Localization","authors":"Qiuping Yi, Z. Yang, Jian Liu, Chen Zhao, Chao Wang","doi":"10.1145/2738038","DOIUrl":"https://doi.org/10.1145/2738038","url":null,"abstract":"During software debugging, a significant amount of effort is required for programmers to identify the root cause of a manifested failure. In this article, we propose a cascade fault localization method to help speed up this labor-intensive process via a combination of weakest precondition computation and constraint solving. Our approach produces a cause tree, where each node is a potential cause of the failure and each edge represents a casual relationship between two causes. There are two main contributions of this article that differentiate our approach from existing methods. First, our method systematically computes all potential causes of a failure and augments each cause with a proper context for ease of comprehension by the user. Second, our method organizes the potential causes in a tree structure to enable on-the-fly pruning based on domain knowledge and feedback from the user. We have implemented our new method in a software tool called CaFL, which builds upon the LLVM compiler and KLEE symbolic virtual machine. We have conducted experiments on a large set of public benchmarks, including real applications from GNU Coreutils and Busybox. Our results show that in most cases the user has to examine only a small fraction of the execution trace before identifying the root cause of the failure.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"114 1","pages":"41:1-41:28"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87943732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Decoupling Capacitance Design Strategies for Power Delivery Networks with Power Gating 功率门控输电网的去耦电容设计策略

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2700825

Tong Xu, Peng Li, S. Sundareswaran

{"title":"Decoupling Capacitance Design Strategies for Power Delivery Networks with Power Gating","authors":"Tong Xu, Peng Li, S. Sundareswaran","doi":"10.1145/2700825","DOIUrl":"https://doi.org/10.1145/2700825","url":null,"abstract":"Power gating is a widely used leakage power saving strategy in modern chip designs. However, power gating introduces unique power integrity issues and trade-offs between switching and rush current (wake-up) supply noises. At the same time, the amount of power saving intrinsically trades off with power integrity. In addition, these trade-offs significantly vary with supply voltage. In this article, we propose systemic decoupling capacitors (decaps) optimization strategies that optimally trade-off between power integrity and leakage saving. Specially, new global decap and reroutable decap design concepts are proposed to relax the tight interaction between power integrity and leakage saving of power gated PDNs with a single supply voltage level. Furthermore, we propose a flexible decap allocation technique to deal with the design trade-offs under multiple supply voltage levels. The proposed strategies are implemented in an automatic design flow for choosing the optimal amount of local decaps, global decaps and reroutable decaps. The conducted experiments demonstrate that leakage saving can be increased significantly compared with the conventional PDN design approach with a single supply voltage level using the proposed techniques without jeopardizing power integrity. For PDN designs operating at two supply voltage levels, the optimal performance is achieved at each voltage level.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"45 1","pages":"38:1-38:30"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87147156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Lazy-RTGC: A Real-Time Lazy Garbage Collection Mechanism with Jointly Optimizing Average and Worst Performance for NAND Flash Memory Storage Systems Lazy- rtgc:一种联合优化NAND闪存存储系统平均和最差性能的实时惰性垃圾收集机制

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2746236

Qi Zhang, Xuandong Li, Linzhang Wang, Tian Zhang, Yi Wang, Z. Shao

{"title":"Lazy-RTGC: A Real-Time Lazy Garbage Collection Mechanism with Jointly Optimizing Average and Worst Performance for NAND Flash Memory Storage Systems","authors":"Qi Zhang, Xuandong Li, Linzhang Wang, Tian Zhang, Yi Wang, Z. Shao","doi":"10.1145/2746236","DOIUrl":"https://doi.org/10.1145/2746236","url":null,"abstract":"Due to many attractive and unique properties, NAND flash memory has been widely adopted in mission-critical hard real-time systems and some soft real-time systems. However, the nondeterministic garbage collection operation in NAND flash memory makes it difficult to predict the system response time of each data request. This article presents Lazy-RTGC, a real-time lazy garbage collection mechanism for NAND flash memory storage systems. Lazy-RTGC adopts two design optimization techniques: on-demand page-level address mappings, and partial garbage collection. On-demand page-level address mappings can achieve high performance of address translation and can effectively manage the flash space with the minimum RAM cost. On the other hand, partial garbage collection can provide the guaranteed system response time. By adopting these techniques, Lazy-RTGC jointly optimizes both the average and the worst system response time, and provides a lower bound of reclaimed free space. Lazy-RTGC is implemented in FlashSim and compared with representative real-time NAND flash memory management schemes. Experimental results show that our technique can significantly improve both the average and worst system performance with very low extra flash-space requirements.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"11 1","pages":"43:1-43:32"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81793545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Array Interleaving—An Energy-Efficient Data Layout Transformation 数组交错——一种节能的数据布局转换

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2747875

Namita Sharma, P. Panda, F. Catthoor, P. Raghavan, T. Aa

{"title":"Array Interleaving—An Energy-Efficient Data Layout Transformation","authors":"Namita Sharma, P. Panda, F. Catthoor, P. Raghavan, T. Aa","doi":"10.1145/2747875","DOIUrl":"https://doi.org/10.1145/2747875","url":null,"abstract":"Optimizations related to memory accesses and data storage make a significant difference to the performance and energy of a wide range of data-intensive applications. These techniques need to evolve with modern architectures supporting wide memory accesses. We investigate array interleaving, a data layout transformation technique that achieves energy efficiency by combining the storage of data elements from multiple arrays in contiguous locations, in an attempt to exploit spatial locality. The transformation reduces the number of memory accesses by loading the right set of data into vector registers, thereby minimizing redundant memory fetches. We perform a global analysis of array accesses, and account for possibly different array behavior in different loop nests that might ultimately lead to changes in data layout decisions for the same array across program regions. Our technique relies on detailed estimates of the savings due to interleaving, and also the cost of performing the actual data layout modifications. We also account for the vector register widths and the possibility of choosing the appropriate granularity for interleaving. Experiments on several benchmarks show a 6--34% reduction in memory energy due to the strategy.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"38 1","pages":"44:1-44:26"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86659632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

A Methodology to Recover RTL IP Functionality for Automatic Generation of SW Applications 一种用于软件应用程序自动生成的RTL IP功能恢复方法

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2720019

N. Bombieri, F. Fummi, S. Vinco

引用次数: 5