Guoqing Chen, Yi Xu, Xing Hu, Xiangyang Guo, Jun Ma, Yu Hu, Yuan Xie
{"title":"TSocket: Thermal Sustainable Power Budgeting","authors":"Guoqing Chen, Yi Xu, Xing Hu, Xiangyang Guo, Jun Ma, Yu Hu, Yuan Xie","doi":"10.1145/2837023","DOIUrl":"https://doi.org/10.1145/2837023","url":null,"abstract":"As technology scales, thermal management for multicore architectures becomes a critical challenge due to increasing power density. Existing power budgeting techniques focus on maximizing performance under a given power budget by optimizing the core configurations. In multicore era, a chip-wide power budget, however, is not sufficient to ensure thermal constraints because the thermal sustainable power capacity varies with different threading strategies and core configurations. In this article, we propose two models to dynamically estimate the thermal sustainable power capacity in homogeneous multicore systems: uniform power model and nonuniform power model. These two models convert the thermal effect of threading strategies and core configurations into power capacity, which provide a context-based core power capacity for power budgeting. Based on these models, we introduce a power budgeting framework aiming to improve the performance within thermal constraints, named as TSocket. Compared to the chip-wide power budgeting solution, TSocket shows 19% average performance improvement for the PARSEC benchmarks in single program scenario and up to 11% performance improvement in multiprogram scenario. The performance improvement is achieved by reducing thermal violations and exploring thermal headrooms.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"74 1","pages":"29:1-29:22"},"PeriodicalIF":0.0,"publicationDate":"2016-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84413627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A C2RTL Framework Supporting Partition, Parallelization, and FIFO Sizing for Streaming Applications","authors":"Daming Zhang, Shuangchen Li, Yongpan Liu, X. Hu, Xinyu He, Yining Zhang, Pei Zhang, Huazhong Yang","doi":"10.1145/2797135","DOIUrl":"https://doi.org/10.1145/2797135","url":null,"abstract":"Developing circuits for streaming applications written in C (or its variants) can benefit greatly from C-to-RTL (C2RTL) synthesis. Yet, most existing C2RTL tools lack system-level options to trade off various design constraints, such as delay and area. This article introduces a systematic way to accomplish C2RTL synthesis for streaming applications containing thousands of lines of C (or its variants) codes. Synthesizing circuits for such large applications presents serious challenges for existing C2RTL tools. Specifically, the proposed approach determines simultaneously the number of pipeline stages and the number of times that each functional block is duplicated in each pipeline stage. A mixed integer linear programming-based solution is formulated for obtaining the optimal solution. Furthermore, a heuristic algorithm is developed for large-scale problems. To accommodate the differences of the data rates between the adjacent hardware modules, first-in-first-out (FIFO) buffers are indispensable, but their overheads are nonnegligible. A parallelism-aware FIFO sizing method is also introduced to determine the optimal sizes of FIFOs. Experimental results on seven real-world applications demonstrate that the algorithms in the synthesis flow can make effective design trade-offs and find superior solutions in a short time compared with existing approaches. Furthermore, the algorithms achieve optimal results in most cases with subsecond running time.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"77 1","pages":"19:1-19:32"},"PeriodicalIF":0.0,"publicationDate":"2016-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83923146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimization of 3D Digital Microfluidic Biochips for the Multiplexed Polymerase Chain Reaction","authors":"Zipeng Li, Tsung-Yi Ho, K. Chakrabarty","doi":"10.1145/2811259","DOIUrl":"https://doi.org/10.1145/2811259","url":null,"abstract":"A digital microfluidic biochip (DMFB) is an attractive technology platform for revolutionizing immunoassays, clinical diagnostics, drug discovery, DNA sequencing, and other laboratory procedures in biochemistry. In most of these applications, real-time polymerase chain reaction (PCR) is an indispensable step for amplifying specific DNA segments. To reduce the reaction time to meet the requirement of “real-time” applications, multiplexed PCR is widely utilized. In recent years, three-dimensional (3D) DMFBs that integrate photodetectors (i.e., cyberphysical DMFBs) have been developed, which offer the benefits of smaller size, higher sensitivity, and faster result generations. However, current DMFB design methods target optimization in only two dimensions, thus ignoring the 3D two-layer structure of a DMFB. Furthermore, these techniques ignore practical constraints related to the interference between on-chip device pairs, the performance-critical PCR thermal loop, and the physical size of devices. Moreover, some practical issues in real scenarios are not stressed (e.g., the avoidance of the cross-contamination for multiplexed PCR). In this article, we describe an optimization solution for a 3D DMFB and present a three-stage algorithm to realize a compact 3D PCR chip layout, which includes: (i) PCR thermal-loop optimization, (ii) 3D global placement based on Strong-Push-Weak-Pull (SPWP) model, and (iii) constraint-aware legalization. To avoid cross-contamination between different DNA samples, we also propose a Minimum-Cost-Maximum-Flow-based (MCMF-based) method for reservoir assignment. Simulation results for four laboratory protocols demonstrate that the proposed approach is effective for the design and optimization of a 3D chip for multiplexed real-time PCR.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"39 1","pages":"25:1-25:27"},"PeriodicalIF":0.0,"publicationDate":"2016-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90613345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design-for-Testability for Functional Broadside Tests under Primary Input Constraints","authors":"I. Pomeranz","doi":"10.1145/2831231","DOIUrl":"https://doi.org/10.1145/2831231","url":null,"abstract":"Functional broadside tests avoid overtesting of delay faults by creating functional operation conditions during the clock cycles where delay faults are detected. When a circuit is embedded in a larger design, a functional broadside test needs to take into consideration the functional constraints that the design creates for its primary input vectors. At the same time, application of primary input vectors as part of a scan-based test requires hardware support. An earlier work considered the case where a primary input vector is held constant during a test. The approach described in this article matches the hardware for applying primary input vectors to the functional constraints that the design creates. This increases the transition fault coverage that can be achieved by functional broadside tests. This article also considers the effect on the transition fault coverage achievable using close-to-functional broadside tests.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"103 1","pages":"35:1-35:18"},"PeriodicalIF":0.0,"publicationDate":"2016-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74954796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Security-Aware Obfuscated Priority Assignment for Automotive CAN Platforms","authors":"M. Lukasiewycz, Philipp Mundhenk, S. Steinhorst","doi":"10.1145/2831232","DOIUrl":"https://doi.org/10.1145/2831232","url":null,"abstract":"Security in automotive in-vehicle networks is an increasing problem with the growing connectedness of road vehicles. This article proposes a security-aware priority assignment for automotive controller area network (CAN) platforms with the aim of mitigating scaling effects of attacks on vehicle fleets. CAN is the dominating field bus in the automotive domain due to its simplicity, low cost, and robustness. While messages might be encrypted to enhance the security of CAN systems, their priorities are usually identical for automotive platforms, comprising generally a large number of vehicle models. As a result, the identifier uniquely defines which message is sent, allowing attacks to scale across a fleet of vehicles with the same platform. As a remedy, we propose a methodology that is capable of determining obfuscated message identifiers for each individual vehicle. Since identifiers directly represent message priorities, the approach has to take the resulting response time variations into account while satisfying application deadlines for each vehicle schedule separately. Our approach relies on Quadratically Constrained Quadratic Program (QCQP) solving in two stages, specifying first a set of feasible fixed priorities and subsequently bounded priorities for each message. With the obtained bounds, obfuscated identifiers are determined, using a very fast randomized sampling. The experimental results, consisting of a large set of synthetic test cases and a realistic case study, give evidence of the efficiency of the proposed approach in terms of scalability. The results also show that the diversity of obtained identifiers is effectively optimized with our approach, resulting in a very good obfuscation of CAN messages in in-vehicle communication.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"36 1","pages":"32:1-32:27"},"PeriodicalIF":0.0,"publicationDate":"2016-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84394489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Write Performance by Controlling Target Resistance Distributions in MLC PRAM","authors":"Youngsik Kim, S. Yoo, Sunggu Lee","doi":"10.1145/2820610","DOIUrl":"https://doi.org/10.1145/2820610","url":null,"abstract":"Multi-level cell (MLC) phase change RAM (PRAM) is expected to offer lower cost main memory than DRAM. However, poor write performance is one of the most critical problems for practical applications of MLC PRAM. In this article, we present two schemes to improve write performance by controlling the target resistance distribution of MLC PRAM cells. First, we propose multiple RESET/SET operations that relax the target resistance bands of intermediate logic levels with additional RESET/SET operations, which reduces the program time of intermediate logic levels, thereby improving write performance. Second, we propose a two-step write scheme consisting of lightweight write and idle-time completion write that exploits the fact that hot dirty data tend to be overwritten in a short time period and the MLC PRAM often has long idle times. Experimental results show that the multiple RESET/SET and two-step write schemes result in an average IPC improvement of 15.7% and 10.4%, respectively, on a hybrid DRAM/PRAM main memory subsystem. Furthermore, their integrated solution results in an average IPC improvement of 23.2% (up to 46.4%).","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"52 1","pages":"23:1-23:27"},"PeriodicalIF":0.0,"publicationDate":"2016-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74146305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adapting to Varying Distribution of Unknown Response Bits","authors":"Chandra K. H. Suresh, O. Sinanoglu, S. Ozev","doi":"10.1145/2835489","DOIUrl":"https://doi.org/10.1145/2835489","url":null,"abstract":"Traditionally, test patterns that are generated for a given circuit are applied in an identical manner to all manufactured devices until each device under test either fails or passes each test. With increasing process variations, the statistical diversity of manufactured devices is increasing, making such one-size-fits-all approaches increasingly inefficient. Adaptive test techniques address this problem by tailoring the test decisions for the statistical characteristics of the device under test. In this article, we present several adaptive strategies to enable adaptive unknown bit masking for faster-than-at-speed testing so as to ensure no yield loss while attaining the maximum test quality based on tester memory constraints. We also develop a tester-enabled compression scheme that helps alleviate memory constraints further, shifting the tradeoff space favorably to improve test quality.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"23 1","pages":"33:1-33:22"},"PeriodicalIF":0.0,"publicationDate":"2016-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74621912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Soft-Error Robust and Energy-Efficient Register File in GPGPUs using Resistive Memory","authors":"Jingweijia Tan, Zhi Li, Mingsong Chen, Xin Fu","doi":"10.1145/2827697","DOIUrl":"https://doi.org/10.1145/2827697","url":null,"abstract":"The increasing adoption of graphics processing units (GPUs) for high-performance computing raises the reliability challenge, which is generally ignored in traditional GPUs. GPUs usually support thousands of parallel threads and require a sizable register file. Such large register file is highly susceptible to soft errors and power-hungry. Although ECC has been adopted to register file in modern GPUs, it causes considerable power overhead, which further increases the power stress. Thus, an energy-efficient soft-error protection mechanism is more desirable. Besides its extremely low leakage power consumption, resistive memory (e.g., spin-transfer torque RAM) is also immune to the radiation induced soft errors due to its magnetic field based storage. In this article, we propose to LEverage reSistive memory to enhance the Soft-error robustness and reduce the power consumption (LESS) of registers in the General-Purpose computing on GPUs (GPGPUs). Since resistive memory experiences longer write latency compared to SRAM, we explore the unique characteristics of GPGPU applications to obtain the win-win gains: achieving the near-full soft-error protection for the register file, and meanwhile substantially reducing the energy consumption with negligible performance degradation. Our experimental results show that LESS is able to mitigate the registers soft-error vulnerability by 86% and achieve 61% energy savings with negligible (e.g., 1%) performance degradation.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"155 1","pages":"34:1-34:25"},"PeriodicalIF":0.0,"publicationDate":"2016-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72787720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance-Driven Assignment of Buffered I/O Signals in Area-I/O Flip-Chip Designs","authors":"Jin-Tai Yan","doi":"10.1145/2818642","DOIUrl":"https://doi.org/10.1145/2818642","url":null,"abstract":"Due to the inappropriate assignment of bump pads or the improper assignment of I/O buffers, the constructed buffered I/O signals in an area-I/O flip-chip design may yield longer maximum delay. In this article, the problem of assigning performance-driven buffered I/O signals in an area-I/O flip-chip design is first formulated. Furthermore, the assignment of the buffered I/O signals can be divided into two sequential phases: Construction of performance-driven I/O signals and Assignment of timing-constrained I/O buffers. Finally, an efficient matching-based approach is proposed to construct the performance-driven I/O signals for the given I/O pins and assign the timing-constrained I/O buffers into the constructed I/O signals in the assignment of the buffered I/O signals in an area-I/O flip-chip design. Compared with the experimental results of seven tested circuits in the Elmore delay model, the experimental results show that the matching-based assignment in our proposed approach can reduce 3.56% of the total path delay, 9.72% of the maximum input delay, 5.90% of the input skew, 5.64% of the maximum output delay, and 6.25% of the output skew on average by reassigning the I/O buffers. Our proposed approach can further reduce 38.89% of the total path delay, 44.00% of the maximum input delay, 49.13% of the input skew, 44.93% of the maximum output delay, and 50.82% of output skew on average by reconstructing the I/O signals and reassigning the I/O buffers into the I/O signals. Compared with the experimental results of seven tested circuits in Peng's [Peng et al. 2006] publication, the experimental results show that our proposed matching-based approach can further reduce 71.06% of the total path delay, 67.83% of the maximum input delay, 59.84% of the input skew, 68.87% of the maximum output delay, and 61.46% of the output skew on average. On the other hand, compared with the experimental results of five tested circuits in Lai's [Lai and Chen 2008] publication, the experimental results show that our proposed approach can further reduce 75.36% of the total path delay, 48.94% of the input skew, and 52.80% of the output skew on the average.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"1 1","pages":"21:1-21:24"},"PeriodicalIF":0.0,"publicationDate":"2016-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76260967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Kritikakou, F. Catthoor, Vasilios I. Kelefouras, C. Goutis
{"title":"Array Size Computation under Uniform Overlapping and Irregular Accesses","authors":"A. Kritikakou, F. Catthoor, Vasilios I. Kelefouras, C. Goutis","doi":"10.1145/2818643","DOIUrl":"https://doi.org/10.1145/2818643","url":null,"abstract":"The size required to store an array is crucial for an embedded system, as it affects the memory size, the energy per memory access, and the overall system cost. Existing techniques for finding the minimum number of resources required to store an array are less efficient for codes with large loops and not regularly occurring memory accesses. They have to approximate the accessed parts of the array leading to overestimation of the required resources. Otherwise, their exploration time is increased with an increase over the number of the different accessed parts of the array. We propose a methodology to compute the minimum resources required for storing an array which keeps the exploration time low and provides a near-optimal result for regularly and non-regularly occurring memory accesses and overlapping writes and reads.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"115 1","pages":"22:1-22:35"},"PeriodicalIF":0.0,"publicationDate":"2016-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83845665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}