{"title":"Runtime dependency analysis for loop pipelining in High-Level Synthesis","authors":"M. Alle, Antoine Morvan, Steven Derrien","doi":"10.1145/2463209.2488796","DOIUrl":"https://doi.org/10.1145/2463209.2488796","url":null,"abstract":"Research on High-Level Synthesis has mainly focused on applications with statically determinable characteristics and current tools often perform poorly in presence of data-dependent memory accesses. The reason is that they rely on conservative static scheduling strategies, which lead to inefficient implementations. In this work, we propose to address this issue by leveraging well-known techniques used in superscalar processors to perform runtime memory disambiguation. Our approach, implemented as a source-to-source transformation at the C level, demonstrates significant performance improvements for a moderate increase in area while retaining portability among HLS tools.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124628139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wen-Hao Liu, Yaoguang Wei, C. Sze, C. Alpert, Zhuo Li, Yih-Lang Li, Natarajan Viswanathan
{"title":"Routing congestion estimation with real design constraints","authors":"Wen-Hao Liu, Yaoguang Wei, C. Sze, C. Alpert, Zhuo Li, Yih-Lang Li, Natarajan Viswanathan","doi":"10.1145/2463209.2488847","DOIUrl":"https://doi.org/10.1145/2463209.2488847","url":null,"abstract":"To address the routability issue, routing congestion estimators (RCE) become essential in industrial design flow. Recently, several RCEs [1-4] based on global routing engines are developed, but they typically ignore the effects of routing on timing so that the identified routing paths may be overlong and thus impractical. To be aware of the timing issues, our proposed global-routing-based RCE obeys the layer directive and scenic constraints to respectively limit the routing layers and the maximum routing wirelength of the potentially timing-critical nets. To handle the scenic constrains, we propose a novel method based on a relaxation-legalization scheme. Also, because the work in [5] reveals that congestion ratio is a better indicator than overflow to evaluate routability, this work focuses on minimizing the congestion ratio rather than overflows. As will be shown, the problem of minimizing congestion ratio is more complicated than minimizing overflows, so we develop a new rip-up and rerouting scheme to reduce congestion and further to approach a target congestion ratio. Moreover, to fit the demands of practical uses, this work presents a control utility to trade off runtime and quality, which is an essential function to an industrial RCE tool. Experiments reveal that the proposed RCE is faster and more accurate than another industrial global-routing-based RCE.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114498655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Smart non-default routing for clock power reduction","authors":"A. Kahng, Seokhyeong Kang, Hyein Lee","doi":"10.1145/2463209.2488846","DOIUrl":"https://doi.org/10.1145/2463209.2488846","url":null,"abstract":"At advanced process nodes, non-default routing rules (NDRs) are integral to clock network synthesis methodologies. NDRs apply wider wire widths and spacings to address electromigration constraints, and to reduce parasitic and delay variations. However, wider wires result in larger driven capacitance and dynamic power. In this work, we quantify the potential for capacitance and power reduction through the application of “smart” NDR (SNDR) that substitute narrower-width NDRs on selected clock network segments, while maintaining skew, slew, delay and EM reliability criteria. We propose a practical methodology to apply smart NDRs in standard clock tree synthesis flows. Our studies with a 32/28nm library and open-source benchmarks confirm substantial (average of 9.2%) clock wire capacitance reduction and an average of 4.9% clock switching power savings over the current fixed-NDR methodology, without loss of QoR in the clock distribution.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115486221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Shulaker, J. V. Rethy, G. Hills, Hong-Yu Chen, G. Gielen, H. Wong, S. Mitra
{"title":"Sacha: The stanford carbon nanotube controlled handshaking robot","authors":"M. Shulaker, J. V. Rethy, G. Hills, Hong-Yu Chen, G. Gielen, H. Wong, S. Mitra","doi":"10.1145/2463209.2488887","DOIUrl":"https://doi.org/10.1145/2463209.2488887","url":null,"abstract":"Low-power applications, such as sensing, are becoming increasingly important and demanding in terms of minimizing energy consumption, driving the search for new and innovative interface architectures and technologies. Carbon Nanotube FETs (CNFETs) are excellent candidates for further energy reduction, as CNFET-based digital circuits are projected to potentially achieve an order of magnitude improvement in energy-delay product at highly scaled technology nodes. This paper presents an overview of the first demonstration of a complete sub-system, a sensor interface circuit, implemented entirely using CNFETs. The demonstrated sub-system is an all-digital capacitive sensor to digital converter. The CNFET sensor interface is demonstrated by using the CNFET circuitry to interface with a sensor used to control a handshaking robot.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"42 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120922323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Verification of digitally-intensive analog circuits via kernel ridge regression and hybrid reachability analysis","authors":"H. Lin, Peng Li, C. Myers","doi":"10.1145/2463209.2488814","DOIUrl":"https://doi.org/10.1145/2463209.2488814","url":null,"abstract":"The emergence of digitally-intensive analog circuits introduces new challenges to formal verification due to increased digital design content, and non-ideal digital effects such as finite resolution, round-off error and overflow. We propose a machine learning approach to convert digital blocks to conservative analog approximations via the use of kernel ridge regression. These learned models are then adopted in a hybrid formal reachability analysis framework where the support function based manipulations are developed to efficiently handle the large linear portion of the design and the more general satisfiability modulo theories technique is applied to the remaining nonlinear portion. The efficiency of the proposed method is demonstrated for the locked time verification of a digitally intensive phase locked loop.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"30 23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120958680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tuo Li, M. Shafique, Jude Angelo Ambrose, Semeen Rehman, J. Henkel, S. Parameswaran
{"title":"RASTER: Runtime adaptive spatial/temporal error resiliency for embedded processors","authors":"Tuo Li, M. Shafique, Jude Angelo Ambrose, Semeen Rehman, J. Henkel, S. Parameswaran","doi":"10.1145/2463209.2488809","DOIUrl":"https://doi.org/10.1145/2463209.2488809","url":null,"abstract":"Applying error recovery monotonously can either compromise the real-time constraint, or worsen the power/energy envelope. Neither of these violations can be realistically accepted in embedded system design, which expects ultra efficient realization of a given application. In this paper, we propose a HW/SW methodology that exploits both application specific characteristics and Spatial/Temporal redundancy. Our methodology combines design-time and runtime optimizations, to enable the resultant embedded processor to perform runtime adaptive error recovery operations, precisely targeting the reliability-wise critical instruction executions. The proposed error recovery functionality can dynamically 1) evaluate the reliability cost economy (in terms of execution-time and dynamic power), 2) determine the most profitable scheme, and 3) adapt to the corresponding error recovery scheme, which is composed of spatial and temporal redundancy based error recovery operations. The experimental results have shown that our methodology at best can achieve fifty times greater reliability while maintaining the execution time and power deadlines, when compared to the state of the art.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121156539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Handling design and implementation optimizations in equivalence checking for behavioral synthesis","authors":"Zhenkun Yang, S. Ray, K. Hao, Fei Xie","doi":"10.1145/2463209.2488878","DOIUrl":"https://doi.org/10.1145/2463209.2488878","url":null,"abstract":"Behavioral synthesis involves generating hardware design via compilation of its Electronic System Level (ESL) description to an RTL implementation. Equivalence checking is critical to ensure that the synthesized RTL conforms to its ESL specification. Such equivalence checking must effectively handle design and implementation optimizations. We identify two key optimizations that complicate equivalence checking for behavioral synthesis: (1) operation gating, and (2) global variables. We develop a sequential equivalence checking (SEC) framework to compare ESL designs with RTL in the presence of these optimizations. Our approach can handle designs with more than 32K LoC RTL synthesized from practical ESL designs. Furthermore, our evaluation found a bug in a commercial tool, underlining both the importance of SEC and the effectiveness of our approach.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126324669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constraint abstraction for vectorless power grid verification","authors":"Xuanxing Xiong, Jia Wang","doi":"10.1145/2463209.2488841","DOIUrl":"https://doi.org/10.1145/2463209.2488841","url":null,"abstract":"Vectorless power grid verification is a formal approach to analyze power supply noises across the chip without detailed current waveforms. It is typically formulated and solved as linear programs, which demand intensive computational power, especially for large-scale power grids. In this paper, we propose a constraint abstraction technique to reduce the computation cost of vectorless verification. The boundary condition of a subgrid is modeled by boundary constraints, which enable efficient calculation of conservative bounds of power supply noises in a divide-and-conquer manner. Experimental results show that the proposed approach achieves significant speedup over prior art while maintaining good solution quality.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127408678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient and effective analytical placer for FPGAs","authors":"Tzu-Hen Lin, Pritha Banerjee, Yao-Wen Chang","doi":"10.1145/2463209.2488746","DOIUrl":"https://doi.org/10.1145/2463209.2488746","url":null,"abstract":"The increasing design complexity of modern circuits has made traditional FPGA placement techniques not efficient anymore. To improve the scalability, commercial FPGA placement tools have started migrating to analytical placement. In this paper, we propose the first academic multilevel timing-and-wirelength-driven analytical placement algorithm for FPGAs. Our proposed algorithm consists of (1) multilevel timing-and-wirelength-driven analytical global placement with the novel block alignment consideration, (2) partitioning-based legalization, (3) wirelength-driven block matching-based detailed placement, and (4) timing-driven simulated-annealing-based detailed placement. Experimental results show that our proposed approach can achieve 6.91 × speedup on average with 7% smaller critical path delay and 1% shorter routed wirelength compared to VPR, the well-known, state-of-the-art academic simulated-annealing-based FPGA placer.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127002124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerators for biologically-inspired attention and recognition","authors":"Mi Sun Park, Chuanjun Zhang, M. DeBole, S. Kestur","doi":"10.1145/2463209.2488900","DOIUrl":"https://doi.org/10.1145/2463209.2488900","url":null,"abstract":"Video and image content has begun to play a growing role in many applications, ranging from video games to autonomous self-driving vehicles. In this paper, we present accelerators for gist-based scene recognition, saliency-based attention, and HMAX-based object recognition that have multiple uses and are based on the current understanding of the vision systems found in the visual cortex of the mammalian brain. By integrating them into a two-level hierarchical system, we improve recognition accuracy and reduce computational time. Results of our accelerator prototype on a multi-FPGA system show real-time performance and high recognition accuracy with large speedups over existing CPU, GPU and FPGA implementations.","PeriodicalId":320207,"journal":{"name":"2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131814753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}