ACM Trans. Design Autom. Electr. Syst.最新文献

筛选
英文 中文
Scheduling Globally Asynchronous Locally Synchronous Programs for Guaranteed Response Times 调度全局异步本地同步程序保证响应时间
ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2740961
Heejong Park, Avinash Malik, Z. Salcic
{"title":"Scheduling Globally Asynchronous Locally Synchronous Programs for Guaranteed Response Times","authors":"Heejong Park, Avinash Malik, Z. Salcic","doi":"10.1145/2740961","DOIUrl":"https://doi.org/10.1145/2740961","url":null,"abstract":"Safety-critical software systems need to guarantee functional correctness and bounded response times to external input events. Programs designed using reactive programming languages, based on formal mathematical semantics, can be automatically verified for functional correctness guarantees. Real-time guarantees on the other hand are much harder to achieve. In this article we provide a static analysis framework for guaranteeing response times for reactive programs developed using the Globally Asynchronous Locally Synchronous (GALS) model of computation. The proposed approach is applicable to scheduling of GALS programs for different target architectures with single or multiple processors or cores. A Satisfiability Modulo Theory (SMT) formulation in the quantifier free linear real arithmetic (QF_LRA) logic is used for scheduling. A novel technique to encode rendezvous used in synchronization of globally asynchronous processes in the presence of locally synchronous parallelism and arbitrary preemption into QF_LRA logic is presented. Finally, our SMT formulation is shown to produce schedules in reasonable time.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"40 1","pages":"40:1-40:25"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91285539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Design of Ultra-Low Power Scalable-Throughput Many-Core DSP Applications 超低功耗可扩展吞吐量多核DSP应用的设计
ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2720018
Meeta Srivastav, M. Ehteshamuddin, K. Stegner, L. Nazhandali
{"title":"Design of Ultra-Low Power Scalable-Throughput Many-Core DSP Applications","authors":"Meeta Srivastav, M. Ehteshamuddin, K. Stegner, L. Nazhandali","doi":"10.1145/2720018","DOIUrl":"https://doi.org/10.1145/2720018","url":null,"abstract":"We propose a system-level solution in designing process variation aware (PVA) scalable-throughput many-core systems for energy constrained applications. In our proposed methodology, we leverage the benefits of voltage scaling for obtaining energy efficiency while compensating for the loss in throughput by exploiting parallelism present in various DSP designs. We demonstrate that such a hybrid method consumes 6.27%- 28.15% less power as compared to simple dynamic voltage scaling over different workload environments. Design details of a prototype chip fabricated on 90nm technology node and its findings are presented. Chip consists of 8 homogeneous FIR cores, which are capable of running from near-threshold to nominal voltages. In our 20 chip population, we observe 7% variation in speed among the cores at nominal voltage (0.9V) and 26% at near threshold voltage (0.55V). We also observe 54% variation in power consumption of the cores. For any desired throughput, the optimum number of cores and their optimum operating voltage is chosen based on the speed and power characteristics of the cores present inside the chip. We will also present analysis on energy-efficiency of such systems based on changes in ambient temperature.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"8 1","pages":"34:1-34:21"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84156127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
System-Level Observation Framework for Non-Intrusive Runtime Monitoring of Embedded Systems 嵌入式系统非侵入式运行时监控的系统级观察框架
ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2717310
Jong Chul Lee
{"title":"System-Level Observation Framework for Non-Intrusive Runtime Monitoring of Embedded Systems","authors":"Jong Chul Lee","doi":"10.1145/2717310","DOIUrl":"https://doi.org/10.1145/2717310","url":null,"abstract":"As the complexity of embedded systems rapidly increases, the use of traditional analysis and debug methods encounters significant challenges in monitoring, analyzing, and debugging the complex interactions of various software and hardware components. This situation is further exacerbated for in-situ debugging and verification in which traditional debug and trace interfaces that require physical access are unavailable, infeasible, or cost prohibitive. In this article, we present a system-level observation framework that provides minimally intrusive methods for dynamically monitoring and analyzing deeply integrated hardware and software components within embedded systems. The system-level observation framework monitors hardware and software events by inserting additional logic for detecting designer-specified events within hardware cores to observe complex interaction across hardware and software boundaries at runtime, and provides visibility for monitoring complex execution behavior of software applications without affecting the system execution.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"45 1","pages":"42:1-42:27"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86414564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
High-Throughput Logic Timing Simulation on GPGPUs 基于gpgpu的高吞吐量逻辑时序仿真
ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2714564
S. Holst, M. Imhof, H. Wunderlich
{"title":"High-Throughput Logic Timing Simulation on GPGPUs","authors":"S. Holst, M. Imhof, H. Wunderlich","doi":"10.1145/2714564","DOIUrl":"https://doi.org/10.1145/2714564","url":null,"abstract":"Many EDA tasks such as test set characterization or the precise estimation of power consumption, power droop and temperature development, require a very large number of time-aware gate-level logic simulations. Until now, such characterizations have been feasible only for rather small designs or with reduced precision due to the high computational demands.\u0000 The new simulation system presented here is able to accelerate such tasks by more than two orders of magnitude and provides for the first time fast and comprehensive timing simulations for industrial-sized designs. Hazards, pulse-filtering, and pin-to-pin delay are supported for the first time in a GPGPU accelerated simulator, and the system can easily be extended to even more realistic delay models and further applications.\u0000 A sophisticated mapping with efficient memory utilization and access patterns as well as minimal synchronizations and control flow divergence is able to use the full potential of GPGPU architectures. To provide such a mapping, we combine for the first time the versatility of event-based timing simulation and multi-dimensional parallelism used in GPU-based gate-level simulators. The result is a throughput-optimized timing simulation algorithm, which runs many simulation instances in parallel and at the same time fully exploits gate-parallelism within the circuit.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"15 1","pages":"37:1-37:22"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75603542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Explaining Software Failures by Cascade Fault Localization 通过级联故障定位解释软件故障
ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2738038
Qiuping Yi, Z. Yang, Jian Liu, Chen Zhao, Chao Wang
{"title":"Explaining Software Failures by Cascade Fault Localization","authors":"Qiuping Yi, Z. Yang, Jian Liu, Chen Zhao, Chao Wang","doi":"10.1145/2738038","DOIUrl":"https://doi.org/10.1145/2738038","url":null,"abstract":"During software debugging, a significant amount of effort is required for programmers to identify the root cause of a manifested failure. In this article, we propose a cascade fault localization method to help speed up this labor-intensive process via a combination of weakest precondition computation and constraint solving. Our approach produces a cause tree, where each node is a potential cause of the failure and each edge represents a casual relationship between two causes. There are two main contributions of this article that differentiate our approach from existing methods. First, our method systematically computes all potential causes of a failure and augments each cause with a proper context for ease of comprehension by the user. Second, our method organizes the potential causes in a tree structure to enable on-the-fly pruning based on domain knowledge and feedback from the user. We have implemented our new method in a software tool called CaFL, which builds upon the LLVM compiler and KLEE symbolic virtual machine. We have conducted experiments on a large set of public benchmarks, including real applications from GNU Coreutils and Busybox. Our results show that in most cases the user has to examine only a small fraction of the execution trace before identifying the root cause of the failure.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"114 1","pages":"41:1-41:28"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87943732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Least Upper Delay Bound for VBR Flows in Networks-on-Chip with Virtual Channels 带虚拟信道的片上网络中VBR流的最小上延迟界
ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2733374
Fahimeh Jafari, Zhonghai Lu, A. Jantsch
{"title":"Least Upper Delay Bound for VBR Flows in Networks-on-Chip with Virtual Channels","authors":"Fahimeh Jafari, Zhonghai Lu, A. Jantsch","doi":"10.1145/2733374","DOIUrl":"https://doi.org/10.1145/2733374","url":null,"abstract":"Real-time applications such as multimedia and gaming require stringent performance guarantees, usually enforced by a tight upper bound on the maximum end-to-end delay. For FIFO multiplexed on-chip packet switched networks we consider worst-case delay bounds for Variable Bit-Rate (VBR) flows with aggregate scheduling, which schedules multiple flows as an aggregate flow. VBR Flows are characterized by a maximum transfer size (L), peak rate (p), burstiness (σ), and average sustainable rate (ρ). Based on network calculus, we present and prove theorems to derive per-flow end-to-end Equivalent Service Curves (ESC), which are in turn used for computing Least Upper Delay Bounds (LUDBs) of individual flows. In a realistic case study we find that the end-to-end delay bound is up to 46.9% more accurate than the case without considering the traffic peak behavior. Likewise, results also show similar improvements for synthetic traffic patterns. The proposed methodology is implemented in C++ and has low run-time complexity, enabling quick evaluation for large and complex SoCs.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"221 1","pages":"35:1-35:33"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89136574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Array Interleaving—An Energy-Efficient Data Layout Transformation 数组交错——一种节能的数据布局转换
ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2747875
Namita Sharma, P. Panda, F. Catthoor, P. Raghavan, T. Aa
{"title":"Array Interleaving—An Energy-Efficient Data Layout Transformation","authors":"Namita Sharma, P. Panda, F. Catthoor, P. Raghavan, T. Aa","doi":"10.1145/2747875","DOIUrl":"https://doi.org/10.1145/2747875","url":null,"abstract":"Optimizations related to memory accesses and data storage make a significant difference to the performance and energy of a wide range of data-intensive applications. These techniques need to evolve with modern architectures supporting wide memory accesses. We investigate array interleaving, a data layout transformation technique that achieves energy efficiency by combining the storage of data elements from multiple arrays in contiguous locations, in an attempt to exploit spatial locality. The transformation reduces the number of memory accesses by loading the right set of data into vector registers, thereby minimizing redundant memory fetches. We perform a global analysis of array accesses, and account for possibly different array behavior in different loop nests that might ultimately lead to changes in data layout decisions for the same array across program regions. Our technique relies on detailed estimates of the savings due to interleaving, and also the cost of performing the actual data layout modifications. We also account for the vector register widths and the possibility of choosing the appropriate granularity for interleaving. Experiments on several benchmarks show a 6--34% reduction in memory energy due to the strategy.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"38 1","pages":"44:1-44:26"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86659632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Lazy-RTGC: A Real-Time Lazy Garbage Collection Mechanism with Jointly Optimizing Average and Worst Performance for NAND Flash Memory Storage Systems Lazy- rtgc:一种联合优化NAND闪存存储系统平均和最差性能的实时惰性垃圾收集机制
ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2746236
Qi Zhang, Xuandong Li, Linzhang Wang, Tian Zhang, Yi Wang, Z. Shao
{"title":"Lazy-RTGC: A Real-Time Lazy Garbage Collection Mechanism with Jointly Optimizing Average and Worst Performance for NAND Flash Memory Storage Systems","authors":"Qi Zhang, Xuandong Li, Linzhang Wang, Tian Zhang, Yi Wang, Z. Shao","doi":"10.1145/2746236","DOIUrl":"https://doi.org/10.1145/2746236","url":null,"abstract":"Due to many attractive and unique properties, NAND flash memory has been widely adopted in mission-critical hard real-time systems and some soft real-time systems. However, the nondeterministic garbage collection operation in NAND flash memory makes it difficult to predict the system response time of each data request. This article presents Lazy-RTGC, a real-time lazy garbage collection mechanism for NAND flash memory storage systems. Lazy-RTGC adopts two design optimization techniques: on-demand page-level address mappings, and partial garbage collection. On-demand page-level address mappings can achieve high performance of address translation and can effectively manage the flash space with the minimum RAM cost. On the other hand, partial garbage collection can provide the guaranteed system response time. By adopting these techniques, Lazy-RTGC jointly optimizes both the average and the worst system response time, and provides a lower bound of reclaimed free space. Lazy-RTGC is implemented in FlashSim and compared with representative real-time NAND flash memory management schemes. Experimental results show that our technique can significantly improve both the average and worst system performance with very low extra flash-space requirements.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"11 1","pages":"43:1-43:32"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81793545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Decoupling Capacitance Design Strategies for Power Delivery Networks with Power Gating 功率门控输电网的去耦电容设计策略
ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2700825
Tong Xu, Peng Li, S. Sundareswaran
{"title":"Decoupling Capacitance Design Strategies for Power Delivery Networks with Power Gating","authors":"Tong Xu, Peng Li, S. Sundareswaran","doi":"10.1145/2700825","DOIUrl":"https://doi.org/10.1145/2700825","url":null,"abstract":"Power gating is a widely used leakage power saving strategy in modern chip designs. However, power gating introduces unique power integrity issues and trade-offs between switching and rush current (wake-up) supply noises. At the same time, the amount of power saving intrinsically trades off with power integrity. In addition, these trade-offs significantly vary with supply voltage. In this article, we propose systemic decoupling capacitors (decaps) optimization strategies that optimally trade-off between power integrity and leakage saving. Specially, new global decap and reroutable decap design concepts are proposed to relax the tight interaction between power integrity and leakage saving of power gated PDNs with a single supply voltage level. Furthermore, we propose a flexible decap allocation technique to deal with the design trade-offs under multiple supply voltage levels. The proposed strategies are implemented in an automatic design flow for choosing the optimal amount of local decaps, global decaps and reroutable decaps. The conducted experiments demonstrate that leakage saving can be increased significantly compared with the conventional PDN design approach with a single supply voltage level using the proposed techniques without jeopardizing power integrity. For PDN designs operating at two supply voltage levels, the optimal performance is achieved at each voltage level.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"45 1","pages":"38:1-38:30"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87147156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Methodology to Recover RTL IP Functionality for Automatic Generation of SW Applications 一种用于软件应用程序自动生成的RTL IP功能恢复方法
ACM Trans. Design Autom. Electr. Syst. Pub Date : 2015-06-24 DOI: 10.1145/2720019
N. Bombieri, F. Fummi, S. Vinco
{"title":"A Methodology to Recover RTL IP Functionality for Automatic Generation of SW Applications","authors":"N. Bombieri, F. Fummi, S. Vinco","doi":"10.1145/2720019","DOIUrl":"https://doi.org/10.1145/2720019","url":null,"abstract":"With the advent of heterogeneous multiprocessor system-on-chips (MPSoCs), hardware/software partitioning is again on the rise both in research and in product development. In this new scenario, implementing intellectual-property (IP) blocks as SW applications rather than dedicated HW is an increasing trend to fully exploit the computation power provided by the MPSoC CPUs. On the other hand, whole libraries of IP blocks are available as RTL descriptions, most of them without a corresponding high-level SW implementation. In this context, this article presents a methodology to automatically generate SW applications in C++, by starting from existing RTL IPs implemented in hardware description language (HDL). The methodology exploits an abstraction algorithm to eliminate implementation details typical of HW descriptions (such as cycle-accurate functionality and data types) to guarantee relevant performance of the generated code. The experimental results show that, in many cases, the C++ code automatically generated in a few seconds with the proposed methodology is as efficient as the corresponding code manually implemented from scratch.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"40 1","pages":"36:1-36:26"},"PeriodicalIF":0.0,"publicationDate":"2015-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82643898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信