Younghyun Kim, Sangyoung Park, N. Chang, Q. Xie, Yanzhi Wang, Massoud Pedram
{"title":"Networked architecture for hybrid electrical energy storage systems","authors":"Younghyun Kim, Sangyoung Park, N. Chang, Q. Xie, Yanzhi Wang, Massoud Pedram","doi":"10.1145/2228360.2228453","DOIUrl":"https://doi.org/10.1145/2228360.2228453","url":null,"abstract":"A hybrid electrical energy storage (HEES) system that consists of multiple, heterogeneous electrical energy storage (EES) elements is a promising solution to achieve a cost-effective EES system because no storage element has ideal characteristics. The state-of-the-art HEES systems are based on a shared-bus charge transfer interconnect (CTI) architecture. Consequently, they are quite limited in scalability which is a function of the number of EES banks. This paper is the first introduction of a HEES system based on a networked CTI architecture, which is highly scalable and is capable of accommodating multiple, concurrent charge transfers. The paper starts by presenting a router architecture for the networked CTI and an effective on-line routing algorithm for multiple charge transfers. In the proposed algorithm, negotiated congestion (NC) routing for multiple charge transfers is performed and any lack of routing resources is addressed by merging two or more charge transfers while maximizing the overall energy efficiency by setting the optimal voltage level for the shared CTI. Examples of the proposed networked CTI are presented and the efficacy of the routing algorithm is demonstrated on a mesh-grid networked CTI.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124856563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guangfei Zhang, Huandong Wang, Xinke Chen, Shuai Huang, Peng Li
{"title":"Heterogeneous multi-channel: Fine-grained DRAM control for both system performance and power efficiency","authors":"Guangfei Zhang, Huandong Wang, Xinke Chen, Shuai Huang, Peng Li","doi":"10.1145/2228360.2228517","DOIUrl":"https://doi.org/10.1145/2228360.2228517","url":null,"abstract":"We propose a novel architecture of memory controller, called HMC (Heterogeneous Multi-Channel), as an improvement to the previous homogeneous multi-channel memory controller. HMC groups physical DRAM devices into logical sub-ranks with different data bus width, and controls them simultaneously. Employing new proposed memory access algorithm, HMC manages the number of devices involved in a single memory access flexibly, and achieves the best performance/power efficiency. Using four-core multiprogramming workloads, our experimental results show that HMC improves system performance by 27.6% with 24.2% reduction in DRAM power consumption on average.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128662332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua S. Auerbach, D. F. Bacon, Ioana Baldini, P. Cheng, Stephen J. Fink, R. Rabbah, Sunil Shukla
{"title":"A compiler and runtime for heterogeneous computing","authors":"Joshua S. Auerbach, D. F. Bacon, Ioana Baldini, P. Cheng, Stephen J. Fink, R. Rabbah, Sunil Shukla","doi":"10.1145/2228360.2228411","DOIUrl":"https://doi.org/10.1145/2228360.2228411","url":null,"abstract":"Heterogeneous systems show a lot of promise for extracting highperformance by combining the benefits of conventional architectures with specialized accelerators in the form of graphics processors (GPUs) and reconfigurable hardware (FPGAs). Extracting this performance often entails programming in disparate languages and models, making it hard for a programmer to work equally well on all aspects of an application. Further, relatively little attention is paid to co-execution - the problem of orchestrating program execution using multiple distinct computational elements that work seamlessly together. We present Liquid Metal, a comprehensive compiler and runtime system for a new programming language called Lime. Our work enables the use of a single language for programming heterogeneous computing platforms, and the seamless co-execution of the resultant programs on CPUs and accelerators that include GPUs and FPGAs. We have developed a number of Lime applications, and successfully compiled some of these for co-execution on various GPU and FPGA enabled architectures. Our experience so far leads us to believe the Liquid Metal approach is promising and can make the computational power of heterogeneous architectures more easily accessible to mainstream programmers.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129126337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards efficient SPICE-accurate nonlinear circuit simulation with on-the-fly support-circuit preconditioners","authors":"Xueqian Zhao, Zhuo Feng","doi":"10.1145/2228360.2228564","DOIUrl":"https://doi.org/10.1145/2228360.2228564","url":null,"abstract":"SPICE-accurate simulation of present-day large-scale nonlinear integrated circuit (IC) systems with millions of linear/nonlinear components can be prohibitively expensive, and thus extremely challenging. In this paper, we present a novel support-circuit preconditioning (SCP) technique for tackling large-scale nonlinear circuit simulations by exploiting sparsified graphs of a given circuit network. By extracting support graphs (SGs) from the original linear circuit networks, and combining them with nonlinear devices, support-circuit preconditioner can be efficiently computed using existing matrix solvers, allowing for on-the-fly updates during transient simulations when adopted in Krylov-subspace iterative solvers. Experimental results for a variety of large-scale circuit designs show that the proposed method achieves up to 22X speedups in solving the matrices involved in DC and transient (TR) simulations, and up to 8X reduction in memory usage, when compared with the simulator powered by the state-of-the-art direct solver KLU.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131058938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arkadeb Ghosal, Rhishikesh Limaye, K. Ravindran, S. Tripakis, A. Prasad, Guoqiang Wang, Trung N. Tran, H. Andrade
{"title":"Static dataflow with access patterns: Semantics and analysis","authors":"Arkadeb Ghosal, Rhishikesh Limaye, K. Ravindran, S. Tripakis, A. Prasad, Guoqiang Wang, Trung N. Tran, H. Andrade","doi":"10.1145/2228360.2228479","DOIUrl":"https://doi.org/10.1145/2228360.2228479","url":null,"abstract":"Signal processing and multimedia applications are commonly modeled using Static/Cyclo-Static Dataflow (SDF/CSDF) models. SDF/CSDF explicitly specifies how much data is produced and consumed per firing during computation. This results in strong compile-time analyzability of many useful execution properties such as deadlock absence, channel boundedness, and throughput. However, SDF/CSDF is limited in its ability to capture how data is accessed in time. Hence, using these models often leads to implementations that are suboptimal (i.e., use more resources than necessary) or even incorrect (i.e., use insufficient resources). In this work, we advance a new model called Static Dataflow with Access Patterns (SDF-AP) that captures the timing of data accesses (for both production and consumption). This paper formalizes the semantics of SDF-AP, defines key properties governing model execution, and discusses algorithms to check these properties under correctness and resource constraints. Results are presented to evaluate these analysis algorithms on practical applications modeled by SDF-AP.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131003201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jayanand Asok Kumar, K. Butler, Heesoo Kim, Shobha Vasudevan
{"title":"Early prediction of NBTI effects using RTL source code analysis","authors":"Jayanand Asok Kumar, K. Butler, Heesoo Kim, Shobha Vasudevan","doi":"10.1145/2228360.2228506","DOIUrl":"https://doi.org/10.1145/2228360.2228506","url":null,"abstract":"In present day technology, the design of reliable systems must factor in temporal degradation due to aging effects such as Negative Bias Temperature Instability (NBTI). In this paper, we present a methodology to estimate delay degradation early at the Register Transfer Level (RTL). We statically analyze the RTL source code to determine signal correlations. We then determine probability distributions of RTL signals formally by using probabilistic model checking. Finally, we propagate these signal probabilities through delay macromodels and estimate the delay degradation. We demonstrate our methodology on several benchmarks RTL designs. We estimate the degradation with <;10% error and up to 18.2× speedup in runtime as compared to estimation using gate-level simulations.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127986107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PADE: A high-performance placer with automatic datapath extraction and evaluation through high-dimensional data learning","authors":"Samuel I. Ward, Duo Ding, D. Pan","doi":"10.1145/2228360.2228497","DOIUrl":"https://doi.org/10.1145/2228360.2228497","url":null,"abstract":"This work presents PADE, a new placement flow with automatic datapath extraction and evaluation. PADE applies novel data learning techniques to train, predict, and evaluate potential datapaths using high-dimensional data such as netlist symmetrical structures, initial placement hints and relative area. Extracted datapaths are mapped to bit-stack structures that are aligned and simultaneously placed with the random logic using SAPT [1], the SAPT, a placer built on top of SimPL [2]. Results show at least 7% average total Half-Perimeter Wire Length (HPWL) and 12% Steiner Wire Length (StWL) improvements on industrial hybrid benchmarks and at least 2% average total HPWL and 3% StWL improvements on ISPD 2005 contest benchmarks. To the best of our knowledge, this is the first attempt to link data learning, datapath extraction with evaluation, and placement and has the tremendous potential for pushing placement state-of-the-art for modern circuits which have datapath and random logics.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131716244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the asymptotic costs of multiplexer-based reconfigurability","authors":"Johnathan York, Derek Chiou","doi":"10.1145/2228360.2228503","DOIUrl":"https://doi.org/10.1145/2228360.2228503","url":null,"abstract":"Existing literature documents a number of techniques for combining a set of independent datapath designs into a single datapath that is run-time configurable to the functionality of any datapath in the set. This paper explores how delay, energy and area overhead attributable to reconfigurability scales with the number of configurable functionalities, independent of the design of specific datapaths. Distinct design space regions are identified based upon common scaling properties, with implications on the design and feasible efficiency bounds of reconfigurable devices.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125548734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"STM concurrency control for embedded real-time software with tighter time bounds","authors":"Mohammed El-Shambakey, B. Ravindran","doi":"10.1145/2228360.2228437","DOIUrl":"https://doi.org/10.1145/2228360.2228437","url":null,"abstract":"We consider software transactional memory (STM) concurrency control for multicore real-time software, and present a novel contention manager (CM) for resolving transactional conflicts, called length-based CM (or LCM). We upper bound transactional retries and response times under LCM, when used with G-EDF and G-RMA schedulers. We identify the conditions under which LCM outperforms previous real-time STM CMs and lock-free synchronization. Our implementation and experimental studies reveal that G-EDF/LCM and G-RMA/LCM have shorter or comparable retry costs and response times than other synchronization techniques.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125586217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computer generation of streaming sorting networks","authors":"M. Zuluaga, Peter Milder, Markus Püschel","doi":"10.1145/2228360.2228588","DOIUrl":"https://doi.org/10.1145/2228360.2228588","url":null,"abstract":"Sorting networks offer great performance but become prohibitively expensive for large data sets. We present a domain-specific language and compiler to automatically generate hardware implementations of sorting networks with reduced area and optimized for latency or throughput. Our results show that the generator produces a wide range of Pareto-optimal solutions that both compete with and outperform prior sorting hardware.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126737656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}