{"title":"Trace-based automated logical debugging for high-level synthesis generated circuits","authors":"Pietro Fezzardi, M. Castellana, Fabrizio Ferrandi","doi":"10.1109/ICCD.2015.7357111","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357111","url":null,"abstract":"In this paper we present an approach for debugging hardware designs generated by High-Level Synthesis (HLS), relieving users from the burden of identifying the signals to trace and from the error-prone task of manually checking the traces. The necessary steps are performed after HLS, independently of it and without affecting the synthesized design. For this reason our methodology should be easily adaptable to any HLS tools. The proposed approach makes full use of HLS compile time informations. The executions of the simulated design and the original C program can be compared, checking if there are discrepancies between values of C variables and signals in the design. The detection is completely automated, that is, it does not need any input but the program itself and the user does not have to know anything about the overall compilation process. The design can be validated on a given set of test cases and the discrepancies are detected by the tool. Relationships between the original high-level source code and the generated HDL are kept by the compiler and shown to the user. The granularity of such discrepancy analysis is per-operation and it includes the temporary variables inserted by the compiler. As a consequence the design can be debugged as is, with no restrictions on optimizations available during HLS. We show how this methodology can be used to identify different kind of bugs: 1) introduced by the HLS tool used for the synthesis; 2) introduced using buggy libraries of hardware components for HLS; 3) undefined behavior bugs in the original high-level source code.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124313369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"POS: A Popularity-based Online Scaling scheme for RAID-structured storage systems","authors":"Si Wu, Yinlong Xu, Yongkun Li, Yunfeng Zhu","doi":"10.1109/ICCD.2015.7357095","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357095","url":null,"abstract":"The ever-increasing demand of storage capability leads to scaling requirement in RAID-structured storage systems. Previous approaches to RAID scaling mainly focus on minimizing data migration, without considering the user-level application accesses. However, the mixed scaling I/Os and user accesses in practical systems will interfere with each other, which results in significant performance degradation of both the data migration time and the user response time. In this paper, we divide the whole storage space into multiple zones and measure the popularity (mainly using the metric of access frequency) of each zone. Based on the measured popularity, we propose an online scheme, namely Popularity-based Online Scaling (POS), to scale RAID-structured storage systems. The main idea of POS is to scale storage areas with high popularity first so as to better exploit workload locality. POS can efficiently alleviate the performance degradation of user response time and data migration time during the scaling process. It can be readily deployed atop various conventional RAID scaling approaches to improve their performance. To evaluate the performance of POS, we implement FastScale and FastScale with POS (POS-FS) in the same system. Through extensive benchmark studies on real-system workloads, we show that POS can efficiently reduce the response time to user requests and scaling I/Os and improve the sequentiality of data accesses.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117262999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FPGA-SPICE: A simulation-based power estimation framework for FPGAs","authors":"Xifan Tang, P. Gaillardon, G. Micheli","doi":"10.1109/ICCD.2015.7357183","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357183","url":null,"abstract":"Mainstream Field Programmable Gate Array (FPGA) power estimation tools are based on probabilistic activity estimation and analytical power models. The power consumption of the programmable resources of FPGAs is highly sensitive to their configurations. Due to their highly flexible nature, the configurations of FPGAs routing multiplexers or Look Up Tables (LUTs) are really different from a design to another but current analytical power models cannot accurately capture the associated power differences. In this paper, we introduce a simulation-based power estimation framework for FPGAs, called FPGA-SPICE, which supports any FPGA architecture that can be described with an architectural description language. Our power estimation engine automatically generates accurate SPICE netlists according to the FPGA configurations and enables precise power analysis of FPGA architectures. SPICE testbenches can be generated at different level of complexity, denoted as full-chip-level, grid-level and component-level testbenches. Full-chip-level testbenches dump the netlists associated with the complete FPGA fabric. To reduce simulation time, FPGA-SPICE can split the full-chip-level testbenches into grid-level testbenches, each of which consisting of a complete logic block netlist, or component-level testbenches, which consider individual circuit elements, i.e., multiplexers, LUTs, flip-flops, etc., separately. We show that the grid/component-level approach can achieve 14 × speed-up with a moderate 14% accuracy loss, compared to the full-chip level. We also use FPGA-SPICE to study the power characteristics of a commercial FPGA architecture at different technology nodes. Experimental results show that the global routing architecture consumes 50% of the total power, the local routing architecture claims for 40% of the total power, and the remaining 10% comes from the LUTs and flip-flops.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125537227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Emulation-based selection and assessment of assertion checkers for post-silicon validation","authors":"Pouya Taatizadeh, N. Nicolici","doi":"10.1109/ICCD.2015.7357083","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357083","url":null,"abstract":"The objective of post-silicon validation is to detect design errors on early silicon prototypes. Electrically-induced errors commonly manifest as bit-flips in the logic domain and they occur under unique operating conditions, which are often not-easily-repeatable. In order to shorten the long detection latencies from an error's manifestation until its observation (i.e. system crash), embedded assertion checkers can be employed. Nonetheless, relying on simulation-based experiments for selecting and assessing the usefulness of a subset of assertion checkers (to be committed to silicon) suffers from limitations associated with the slow simulation speed. To address this concern, in this paper we present a systematic method to automatically design emulation-based experiments that can aid the selection and assessment of the embedded assertion checkers. Our results indicate improvements of up to 10% on average for the coverage of flip-flops that are affected by bit-flips when compared to results obtained from simulation-based experiments.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"34 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124465085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yufei Ma, Minkyu Kim, Yu Cao, Jae-sun Seo, S. Vrudhula
{"title":"Energy-efficient reconstruction of compressively sensed bioelectrical signals with stochastic computing circuits","authors":"Yufei Ma, Minkyu Kim, Yu Cao, Jae-sun Seo, S. Vrudhula","doi":"10.1109/ICCD.2015.7357144","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357144","url":null,"abstract":"Compressive sensing (CS) allows acquiring sparse signals at sub-Nyquist rate, offering an energy-efficient solution to data acquisition. This is especially important to reduce communication data for mobile medical applications. However, reconstructing the signal from CS is usually left off-line due to the complex computations. In this paper, we integrate two key technologies to enable on-line energy-efficient CS signal reconstruction. These are (1) the use of Bayesian CS Belief Propagation (CS-BP) as the algorithm basis and (2) the novel design of stochastic computing (SC) circuits to efficiently map CS-BP algorithm. The overall signal reconstruction system is implemented with digital SC circuits in 65nm CMOS and recovers compressively sensed electrocardiography (ECG) and electromyography (EMG) signals with 11X to 8X data compression factor. Compared to a conventional binary design, post-layout simulation results show that the proposed stochastic design performs reconstruction with 5X energy-delay product improvement and 2X area reduction.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126415987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gustavo A. Chaparro-Baquero, Soamar Homsi, Omara Vichot, Shaolei Ren, Gang Quan, Shangping Ren
{"title":"Cache allocation for fixed-priority real-time scheduling on multi-core platforms","authors":"Gustavo A. Chaparro-Baquero, Soamar Homsi, Omara Vichot, Shaolei Ren, Gang Quan, Shangping Ren","doi":"10.1109/ICCD.2015.7357169","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357169","url":null,"abstract":"The increased resource sharing on multi-core platforms has posed significant challenges on the predictability of real-time systems. Cache memory partitioning has proven to be one of the most effective methods to improve the predictability and also the schedulability of real-time systems. In this paper, we study how to allocate cache memory of a multi-core platform when scheduling fixed-priority hard real-time tasks. As the bounded worst-case execution time (WCET) of a real-time task varies with its cache allocation, the challenges of this problem are twofold: how to judiciously allocate the cache memory among all real-time tasks and how to map real-time tasks to each core to improve the schedulability. To address these challenges, we develop an approach that takes into consideration not only the WCET variations with cache allocations but also the task period relationship and thus can significantly improve the schedulability of real-time tasks. Our simulation results, based on the SPEC CPU2000 benchmarks suite, show that our approach can increase the schedulability of real-time tasks up to four times when compared to other similar scheduling mechanisms.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125985083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Grandhi, D. McCarthy, C. Spagnol, E. Popovici, S. Cotofana
{"title":"ROST-C: Reliability driven optimisation and synthesis techniques for combinational circuits","authors":"S. Grandhi, D. McCarthy, C. Spagnol, E. Popovici, S. Cotofana","doi":"10.1109/ICCD.2015.7357141","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357141","url":null,"abstract":"Traditional logic synthesis methodologies are driven by timing, power, and area constraints. However, due to aggressive technology shrinking and lower power requirements, circuit reliability is fast turning out to be yet another major constraint in the VLSI design flow. Soft errors, which traditionally affected only the memories, are now also resulting in logic circuit reliability degradation. In this paper, we present a systematic and integrated methodology to address and improve the combinational circuit reliability measured in terms of Soft Error Rate (SER). The proposed SER reduction framework makes use of rewriting based logic optimisation technique which employs local transformations. The main idea behind our proposal is to replace parts of the circuit with functionally equivalent but more reliable counterparts chosen from a pre-computed subset of Negation-Permutation-Negation (NPN) classes of 4-variable functions. Cut enumeration and Boolean matching driven by reliability aware optimisation algorithm are used to identify best possible replacement candidates. Our experiments on a set of MCNC benchmark circuits indicate that the proposed framework can achieve up to 75% reduction of output error probability. On average, about 14% SER reduction is obtained at the expense of very low area overhead of 6.57% that results in 13.52% higher power consumption.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133336199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen-Hsuan Lin, Subhendu Roy, Chun-Yao Wang, D. Pan, Deming Chen
{"title":"CSL: Coordinated and scalable logic synthesis techniques for effective NBTI reduction","authors":"Chen-Hsuan Lin, Subhendu Roy, Chun-Yao Wang, D. Pan, Deming Chen","doi":"10.1109/ICCD.2015.7357109","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357109","url":null,"abstract":"Negative Bias Temperature Instability (NBTI) has become a major reliability concern in nanoscale designs. Although several previous studies have been proposed to address the NBTI effect during logic synthesis, their performance is limited because of focusing on a certain logic synthesis stage. Additionally, their complicated algorithms are not scalable to large designs. To tackle this, we propose a coordinated and scalable logic synthesis approach, which integrates techniques at different logic synthesis stages, ranging from subject graph to technology mapping and mapped netlist, to achieve an effective NBTI reduction. To our best knowledge, this is the first work that considers and mitigates NBTI impact in subject graphs, the earlier stage of logic synthesis. Experimental results on industry-strength benchmarks show that our approach can achieve 6.5% NBTI delay reduction with merely 2.5% area overhead on average, while a previous work barely gets NBTI delay reduction when the circuits are optimized beforehand, the circuit sizes are large, and standard cell libraries are richer.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115994718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Zhang, Gieseo Park, M. Shihab, D. Donofrio, J. Shalf, Myoungsoo Jung
{"title":"OpenNVM: An open-sourced FPGA-based NVM controller for low level memory characterization","authors":"Jie Zhang, Gieseo Park, M. Shihab, D. Donofrio, J. Shalf, Myoungsoo Jung","doi":"10.1109/ICCD.2015.7357179","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357179","url":null,"abstract":"Accurate characterization of real device samples is essential for understanding the true potential of the emerging non-volatile memories (NVMs) and identifying their optimal placement in the memory hierarchy. Even though, NVM devices are now available from different manufacturers, lack of an appropriate NVM controller and evaluation platform in the public domain is the main challenge in extracting empirical data from these real devices. In this paper, we present Open-NVM, an open-sourced, highly configurable FPGA based evaluation/characterization platform for various NVM technologies. Through our OpenNVM, this work reveals important low-level NVM characteristics, including i) static and dynamic latency disparity, ii) error rate variation, iii) power consumption behavior, vi) interrelationship between frequency and NVM operational current. In addition, we also examine state-of-the-art write-once-memory (WOM) codes on a real NVM device and study diverse system-level performance impacts based on our findings. All FPGA source code and detailed information of our hardware design is ready to be open-sourced and downloaded for free.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125174574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clotho: Proactive wearout deceleration in Chip-Multiprocessor interconnects","authors":"A. Vitkovskiy, V. Soteriou, Paul V. Gratz","doi":"10.1109/ICCD.2015.7357092","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357092","url":null,"abstract":"With advancing process technology, Chip-Multiprocessors (CMPs) are experiencing ever worsening reliability due to prolonged operational stresses. The network-on-chip that interconnects the components of CMPs is especially vulnerable to such wearout-induced failure. To tackle this ominous threat we present Clotho, a novel, wearout-aware routing algorithm. Clotho continuously considers the stresses the on-chip interconnect experiences at runtime, along with temperature and fabrication process variation metrics, steering traffic away from locations that are most prone to Electromigration (EM)- and Hot-Carrier Injection (HCI)-induced wear. Under realistic workloads Clotho yields 66% and 8% average increases in mean time to failure for EM and HCI, respectively.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128255865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}