ACM Trans. Design Autom. Electr. Syst.最新文献_第6页

FORTIS: A Comprehensive Solution for Establishing Forward Trust for Protecting IPs and ICs FORTIS:为保护ip和ic建立向前信任的综合解决方案

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2016-09-22 DOI: 10.1145/2893183

Ujjwal Guin, Qihang Shi, Domenic Forte, M. Tehranipoor

{"title":"FORTIS: A Comprehensive Solution for Establishing Forward Trust for Protecting IPs and ICs","authors":"Ujjwal Guin, Qihang Shi, Domenic Forte, M. Tehranipoor","doi":"10.1145/2893183","DOIUrl":"https://doi.org/10.1145/2893183","url":null,"abstract":"With the advent of globalization in the semiconductor industry, it is necessary to prevent unauthorized usage of third-party IPs (3PIPs), cloning and unwanted modification of 3PIPs, and unauthorized production of ICs. Due to the increasing complexity of ICs, system-on-chip (SoC) designers use various 3PIPs in their design to reduce time-to-market and development costs, which creates a trust issue between the SoC designer and the IP owners. In addition, as the ICs are fabricated around the globe, the SoC designers give fabrication contracts to offshore foundries to manufacture ICs and have little control over the fabrication process, including the total number of chips fabricated. Similarly, the 3PIP owners lack control over the number of fabricated chips and/or the usage of their IPs in an SoC. Existing research only partially addresses the problems of IP piracy and IC overproduction, and to the best of our knowledge, there is no work that considers IP overuse. In this article, we present a comprehensive solution for preventing IP piracy and IC overproduction by assuring forward trust between all entities involved in the SoC design and fabrication process. We propose a novel design flow to prevent IC overproduction and IP overuse. We use an existing logic encryption technique to obfuscate the netlist of an SoC or a 3PIP and propose a modification to enable manufacturing tests before the activation of chips which is absolutely necessary to prevent overproduction. We have used asymmetric and symmetric key encryption, in a fashion similar to Pretty Good Privacy (PGP), to transfer keys from the SoC designer or 3PIP owners to the chips. In addition, we also propose to attach an IP digest (a cryptographic hash of the entire IP) to the header of an IP to prevent modification of the IP by the SoC designers. We have shown that our approach is resistant to various attacks with the cost of minimal area overhead.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"21 1","pages":"63:1-63:20"},"PeriodicalIF":0.0,"publicationDate":"2016-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84253967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 56

Streaming Sorting Networks 流式排序网络

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2016-09-22 DOI: 10.1145/2854150

M. Zuluaga, Peter Milder, Markus Püschel

{"title":"Streaming Sorting Networks","authors":"M. Zuluaga, Peter Milder, Markus Püschel","doi":"10.1145/2854150","DOIUrl":"https://doi.org/10.1145/2854150","url":null,"abstract":"Sorting is a fundamental problem in computer science and has been studied extensively. Thus, a large variety of sorting methods exist for both software and hardware implementations. For the latter, there is a trade-off between the throughput achieved and the cost (i.e., the logic and storage invested to sort n elements). Two popular solutions are bitonic sorting networks with O(nlog 2n) logic and storage, which sort n elements per cycle, and linear sorters with O(n) logic and storage, which sort n elements per n cycles. In this article, we present new hardware structures that we call streaming sorting networks, which we derive through a mathematical formalism that we introduce, and an accompanying domain-specific hardware generator that translates our formal mathematical description into synthesizable RTL Verilog. With the new networks, we achieve novel and improved cost-performance trade-offs. For example, assuming that n is a two-power and w is any divisor of n, one class of these networks can sort in n/;w cycles with O(wlog 2n) logic and O(nlog 2n) storage; the other class that we present sorts in nlog 2n/;w cycles with O(w) logic and O(n) storage. We carefully analyze the performance of these networks and their cost at three levels of abstraction: (1) asymptotically, (2) exactly in terms of the number of basic elements needed, and (3) in terms of the resources required by the actual circuit when mapped to a field-programmable gate array. The accompanying hardware generator allows us to explore the entire design space, identify the Pareto-optimal solutions, and show superior cost-performance trade-offs compared to prior work.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"22 1","pages":"55:1-55:30"},"PeriodicalIF":0.0,"publicationDate":"2016-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80067916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Power, Area, and Performance Optimization of Standard Cell Memory Arrays Through Controlled Placement 通过控制放置的标准单元存储阵列的功率、面积和性能优化

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2016-09-22 DOI: 10.1145/2890498

A. Teman, D. Rossi, P. Meinerzhagen, L. Benini, A. Burg

{"title":"Power, Area, and Performance Optimization of Standard Cell Memory Arrays Through Controlled Placement","authors":"A. Teman, D. Rossi, P. Meinerzhagen, L. Benini, A. Burg","doi":"10.1145/2890498","DOIUrl":"https://doi.org/10.1145/2890498","url":null,"abstract":"Embedded memory remains a major bottleneck in current integrated circuit design in terms of silicon area, power dissipation, and performance; however, static random access memories (SRAMs) are almost exclusively supplied by a small number of vendors through memory generators, targeted at rather generic design specifications. As an alternative, standard cell memories (SCMs) can be defined, synthesized, and placed and routed as an integral part of a given digital system, providing complete design flexibility, good energy efficiency, low-voltage operation, and even area efficiency for small memory blocks. Yet implementing an SCM block with a standard digital flow often fails to exploit the distinct and regular structure of such an array, leaving room for optimization. In this article, we present a design methodology for optimizing the physical implementation of SCM macros as part of the standard design flow. This methodology introduces controlled placement, leading to a structured, noncongested layout with close to 100% placement utilization, resulting in a smaller silicon footprint, reduced wire length, and lower power consumption compared to SCMs without controlled placement. This methodology is demonstrated on SCM macros of various sizes and aspect ratios in a state-of-the-art 28nm fully depleted silicon-on-insulator technology, and compared with equivalent macros designed with the noncontrolled, standard flow, as well as with foundry-supplied SRAM macros. The controlled SCMs provide an average 25% reduction in area as compared to noncontrolled implementations while achieving a smaller size than SRAM macros of up to 1Kbyte. Power and performance comparisons of controlled SCM blocks of a commonly found 256 × 32 (1 Kbyte) memory with foundry-provided SRAMs show greater than 65% and 10% reduction in read and write power, respectively, while providing faster access than their SRAM counterparts, despite being of an aspect ratio that is typically unfavorable for SCMs. In addition, the SCM blocks function correctly with a supply voltage as low as 0.3V, well below the lower limit of even the SRAM macros optimized for low-voltage operation. The controlled placement methodology is applied within a full-chip physical implementation flow of an OpenRISC-based test chip, providing more than 50% power reduction compared to equivalently sized compiled SRAMs under a benchmark application.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"59 1","pages":"59:1-59:25"},"PeriodicalIF":0.0,"publicationDate":"2016-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91026128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 48

Resource Sharing Centric Dynamic Voltage and Frequency Scaling for CMP Cores, Uncore, and Memory 资源共享中心动态电压和频率缩放CMP核心，非核心，和内存

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2016-09-22 DOI: 10.1145/2897394

Jae-Yeon Won, Paul V. Gratz, S. Shakkottai, Jiang Hu

{"title":"Resource Sharing Centric Dynamic Voltage and Frequency Scaling for CMP Cores, Uncore, and Memory","authors":"Jae-Yeon Won, Paul V. Gratz, S. Shakkottai, Jiang Hu","doi":"10.1145/2897394","DOIUrl":"https://doi.org/10.1145/2897394","url":null,"abstract":"With the breakdown of Dennard’s scaling over the past decade, performance growth of modern microprocessor design has largely relied on scaling core count in chip multiprocessors (CMPs). The challenge of chip power density, however, remains and demands new power management solutions. This work investigates a coordinated CMP systemwide Dynamic Voltage and Frequency Scaling (DVFS) policy centered around shared resource utilization. This approach represents a new angle on the problem, differing from the conventional core-workload-driven approaches. The key component of our work is per-core DVFS leveraging a technique similar to TCP Vegas congestion control from networking. This TCP Vegas–based DVFS can potentially identify the synergy between power reduction and performance improvement. Further, this work includes uncore (on-chip interconnect and shared last level cache) and main memory DVFS policies coordinated with the per-core DVFS policy. Full system simulations on PARSEC benchmarks show that our technique reduces total energy dissipation by over 47% across all benchmarks with less than 2.3% performance degradation. Our work also leads to 12% more energy savings compared to a prior work CMP DVFS policy.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"47 1","pages":"69:1-69:25"},"PeriodicalIF":0.0,"publicationDate":"2016-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78232433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Area-Aware Decomposition for Single-Electron Transistor Arrays 单电子晶体管阵列的区域感知分解

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2016-09-22 DOI: 10.1145/2898998

C. Ho, Yung-Chih Chen, Chun-Yao Wang, Ching-Yi Huang, S. Datta, N. Vijaykrishnan

{"title":"Area-Aware Decomposition for Single-Electron Transistor Arrays","authors":"C. Ho, Yung-Chih Chen, Chun-Yao Wang, Ching-Yi Huang, S. Datta, N. Vijaykrishnan","doi":"10.1145/2898998","DOIUrl":"https://doi.org/10.1145/2898998","url":null,"abstract":"Single-electron transistor (SET) at room temperature has been demonstrated as a promising device for extending Moore’s law due to its ultra-low power consumption. Existing SET synthesis methods synthesize a Boolean network into a large reconfigurable SET array where the height of SET array equals the number of primary inputs. However, recent experiments on device level have shown that this height is restricted to a small number, say, 10, rather than arbitrary value due to the ultra-low driving strength of SET devices. On the other hand, the width of an SET array is also suggested to be a small value. Consequently, it is necessary to decompose a large SET array into a set of small SET arrays where each of them realizes a sub-function of the original circuit with no more than 10 inputs. Thus, this article presents two techniques for achieving area-efficient SET array decomposition: One is a width minimization algorithm for reducing the area of a single SET array; the other is a depth-bounded mapping algorithm, which decomposes a Boolean network into many sub-functions such that the widths of the corresponding SET arrays are balanced. The width minimization algorithm leads to a 25%--41% improvement compared to the state of the art, and the mapping algorithm achieves a 60% reduction in total area compared to a naïve approach.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"22 1","pages":"70:1-70:20"},"PeriodicalIF":0.0,"publicationDate":"2016-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81734372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Timing Path-Driven Cycle Cutting for Sequential Controllers 时序控制器的时序路径驱动周期切割

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2016-09-22 DOI: 10.1145/2893473

William Lee, Vikas S. Vij, K. Stevens

引用次数: 1

Cyber-Physical Co-Simulation Framework for Smart Cells in Scalable Battery Packs 可扩展电池组中智能电池的网络物理联合仿真框架

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2016-09-22 DOI: 10.1145/2891407

S. Steinhorst, M. Kauer, Arne Meeuw, Swaminathan Narayanaswamy, M. Lukasiewycz, S. Chakraborty

{"title":"Cyber-Physical Co-Simulation Framework for Smart Cells in Scalable Battery Packs","authors":"S. Steinhorst, M. Kauer, Arne Meeuw, Swaminathan Narayanaswamy, M. Lukasiewycz, S. Chakraborty","doi":"10.1145/2891407","DOIUrl":"https://doi.org/10.1145/2891407","url":null,"abstract":"This article introduces a Cyber-physical Co-Simulation Framework (CPCSF) for design and analysis of smart cells that enable scalable battery pack and Battery Management System (BMS) architectures. In contrast to conventional cells in battery packs, where all cells are monitored and controlled centrally, each smart cell is equipped with its own electronics in the form of a Cell Management Unit (CMU). The CMU maintains the cell in a safe and healthy operating state, while system-level battery management functions are performed by cooperation of the smart cells via communication. Here, the smart cells collaborate in a self-organizing fashion without a central controller instance. This enables maximum scalability and modularity, significantly simplifying integration of battery packs. However, for this emerging architecture, system-level design methodologies and tools have not been investigated yet. By contrast, components are developed individually and then manually tested in a hardware development platform. Consequently, the systematic design of the hardware/software architecture of smart cells requires a cyber-physical multi-level co-simulation of the network of smart cells that has to include all the components from the software, electronic, electric, and electrochemical domains. This comprises distributed BMS algorithms running on the CMUs, the communication network, control circuitry, cell balancing hardware, and battery cell behavior. For this purpose, we introduce a CPCSF that enables rapid design and analysis of smart cell hardware/software architectures. Our framework is then applied to investigate request-driven active cell balancing strategies that make use of the decentralized system architecture. In an exhaustive analysis on a realistic 21.6kW h Electric Vehicle (EV) battery pack containing 96 smart cells in series, the CPCSF is able to simulate hundreds of balancing runs together with all system characteristics, using the proposed request-driven balancing strategies at highest accuracy within an overall time frame of several hours. Consequently, the presented CPCSF for the first time allows us to quantitatively and qualitatively analyze the behavior of smart cell architectures for real-world applications.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"76 1","pages":"62:1-62:26"},"PeriodicalIF":0.0,"publicationDate":"2016-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86087683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Index-Resilient Zero-Suppressed BDDs: Definition and Operations 指数弹性零抑制bdd:定义和操作

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2016-09-22 DOI: 10.1145/2905363

A. Bernasconi, V. Ciriani

引用次数: 3

A Hardware-Assisted Energy-Efficient Processing Model for Activity Recognition Using Wearables 基于可穿戴设备的活动识别硬件辅助节能处理模型

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2016-09-22 DOI: 10.1145/2886096

Hassan Ghasemzadeh, Ramin Fallahzadeh, R. Jafari

{"title":"A Hardware-Assisted Energy-Efficient Processing Model for Activity Recognition Using Wearables","authors":"Hassan Ghasemzadeh, Ramin Fallahzadeh, R. Jafari","doi":"10.1145/2886096","DOIUrl":"https://doi.org/10.1145/2886096","url":null,"abstract":"Wearables are being widely utilized in health and wellness applications, primarily due to the recent advances in sensor and wireless communication, which enhance the promise of wearable systems in providing continuous and real-time monitoring and interventions. Wearables are generally composed of hardware/software components for collection, processing, and communication of physiological data. Practical implementation of wearable monitoring in real-life applications is currently limited due to notable obstacles. The wearability and form factor are dominated by the amount of energy needed for sensing, processing, and communication. In this article, we propose an ultra-low-power granular decision-making architecture, also called screening classifier, which can be viewed as a tiered wake-up circuitry, consuming three orders of magnitude-less power than the state-of-the-art low-power microcontrollers. This processing model operates based on computationally simple template matching modules, based on coarse- to fine-grained analysis of the signals with on-demand and gradually increasing the processing power consumption. Initial template matching rejects signals that are clearly not of interest from the signal processing chain, keeping the rest of processing blocks idle. If the signal is likely of interest, the sensitivity and the power of the template matching modules are gradually increased, and ultimately, the main processing unit is activated. We pose optimization techniques to efficiently split a full template into smaller bins, called mini-templates, and activate only a subset of bins during each classification decision. Our experimental results on real data show that this signal screening model reduces power consumption of the processing architecture by a factor of 70% while the sensitivity of detection remains at least 80%.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":" 44","pages":"58:1-58:27"},"PeriodicalIF":0.0,"publicationDate":"2016-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91410831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Efficient Algorithms for Discrete Gate Sizing and Threshold Voltage Assignment Based on an Accurate Analytical Statistical Yield Gradient 基于精确分析统计产量梯度的离散栅极尺寸和阈值电压分配的有效算法

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2016-09-22 DOI: 10.1145/2896819

S. Ramprasath, V. Vasudevan

{"title":"Efficient Algorithms for Discrete Gate Sizing and Threshold Voltage Assignment Based on an Accurate Analytical Statistical Yield Gradient","authors":"S. Ramprasath, V. Vasudevan","doi":"10.1145/2896819","DOIUrl":"https://doi.org/10.1145/2896819","url":null,"abstract":"In this article, we derive a simple and accurate expression for the change in timing yield due to a change in the gate delay distribution. It is based on analytical bounds that we have derived for the moments of the circuit and path delay. Based on this, we propose computationally efficient algorithms for (1) discrete gate sizing and (2) simultaneous gate sizing and threshold voltage (VT) assignment so that the circuit meets a timing yield specification under parameter variations. The use of this analytical yield gradient within a gradient-based timing yield optimization algorithm results in a significant improvement in the runtime as compared to the numerical method, while achieving the same final yield. It also allows us to explore a larger search space in each iteration more efficiently, which is required in the case of simultaneous resizing and VT assignment. We also propose heuristics for resizing/changing the VT of multiple gates in each iteration. This makes it possible to optimize the timing yield for large circuits. Results on ITC ’99 benchmarks show that the proposed multinode resizing algorithm results in a significant improvement in the runtime with a marginal average area penalty and no cost to the final yield achieved.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"69 1","pages":"66:1-66:27"},"PeriodicalIF":0.0,"publicationDate":"2016-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88458444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2