{"title":"Near-optimal, dynamic module reconfiguration in a photovoltaic system to combat partial shading effects","authors":"X. Lin, Yanzhi Wang, Siyu Yue, Donghwa Shin, N. Chang, Massoud Pedram","doi":"10.1145/2228360.2228452","DOIUrl":"https://doi.org/10.1145/2228360.2228452","url":null,"abstract":"Partial shading is a serious obstacle to effective utilization of photovoltaic (PV) systems since it can result in significant output power degradation for the system. A PV system is organized as a series connection of PV modules, each module comprising of a number of series-parallel connected cells. This paper presents modified PV cell structures with integrated switches, imbalanced cell connection topologies for PV modules, and a dynamic programming algorithm to produce near-optimal reconfigurations of each PV module with the goal of maximizing the system output power level under any partial shading patterns. Through simulations, we have demonstrated up to a factor of 2.3X improvement in the output power level of a PV system comprised of 3 PV modules with 60 PV cells per module.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123153652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Observational wear leveling: An efficient algorithm for flash memory management","authors":"Chundong Wang, W. Wong","doi":"10.1145/2228360.2228405","DOIUrl":"https://doi.org/10.1145/2228360.2228405","url":null,"abstract":"In NAND flash memory, wear leveling is employed to evenly distribute program/erase bit flips so as to prevent overall chip failure caused by excessive writes to certain hot spots of the chip. In this paper, we analyze latest wear leveling algorithms, and propose Observational Wear Leveling (OWL). OWL considers the temporal locality of write activities at runtime when blocks are allocated. It also transfers data between blocks of different ages. From our experiments, with minimal additional space and time overhead, OWL can improve wear evenness by as much as 29.9% and 43.2% compared to two state-of-the-art wear leveling algorithms, respectively.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"392 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123093349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rule agnostic routing by using design fabrics","authors":"Gyuszi Suto","doi":"10.1145/2228360.2228443","DOIUrl":"https://doi.org/10.1145/2228360.2228443","url":null,"abstract":"Moore's law requires the shrinking of physical dimensions of the transistors to roughly half their area every two years. This poses a tremendous challenge on how to print and manufacture these ever-shrinking physical components that make up the transistors and the interconnect - generation after process generation. One aspect of this challenge is that the process rules are exploding in complexity - directly translating into physical design EDA (Electronic Design Automation) tool complexity. Traditional design rules governed the spacing, overlap or alignment of any two layout objects from this set: diffusion, poly, via cut, wire, etc. In this work we propose a solution that relies on grids (aka. Fabrics), models the design rules on those grids and presents them to the EDA tools in such a way that it minimizes the complexity cost on the tools' side. In an ideal situation, the proposed solution can completely decouple the tools from the process rules, i.e. even if the tools don't change at all, they'll still be able to support new process nodes.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"461 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115868361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic river network simulation at large scale","authors":"Frank Liu, B. Hodges","doi":"10.1145/2228360.2228491","DOIUrl":"https://doi.org/10.1145/2228360.2228491","url":null,"abstract":"Fully dynamic modeling of large scale river networks is still a challenge. In this paper we describe SPRINT, an interdisciplinary collaborative effort between computer engineering and hydroscience to address the computational aspect of this challenge. Although algorithmic details differ, SPRINT draws many design considerations from SPICE, one of the most fundamental EDA tools. Experimental results demonstrate that SPRINT is capable of simulating large river basins at over 100× faster than real time.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125637738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hui Zhao, Ohyoung Jang, W. Ding, Yuanrui Zhang, M. Kandemir, M. J. Irwin
{"title":"A hybrid NoC design for cache coherence optimization for chip multiprocessors","authors":"Hui Zhao, Ohyoung Jang, W. Ding, Yuanrui Zhang, M. Kandemir, M. J. Irwin","doi":"10.1145/2228360.2228511","DOIUrl":"https://doi.org/10.1145/2228360.2228511","url":null,"abstract":"On chip many-core systems, evolving from prior multi-pro cessor systems, are considered as a promising solution to the performance scalability and power consumption problems. The long communication distance between the traditional multi-processors makes directory-based cache coherence protocols better solutions compared to bus-based snooping protocols even with the overheads from indirections. However, much smaller distances between the CMPcores enhance the reachability of buses, revitalizing the applicability of snooping protocols for cache-to-cache transfers. In this work, we propose a hybrid NoC design to provide optimized support for cache coherency. In our design, on-chip links can be dynamically configured as either point-to-point links between NoC nodes or short buses to facilitate localized snooping. By taking advantage of the best of both worlds, bus-based snooping coherency and NoC-based directory coherency, our approach brings both power and performance benefits.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123029780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Swagath Venkataramani, Amit Sabne, V. Kozhikkottu, K. Roy, A. Raghunathan
{"title":"SALSA: Systematic logic synthesis of approximate circuits","authors":"Swagath Venkataramani, Amit Sabne, V. Kozhikkottu, K. Roy, A. Raghunathan","doi":"10.1145/2228360.2228504","DOIUrl":"https://doi.org/10.1145/2228360.2228504","url":null,"abstract":"Approximate computing has emerged as a new design paradigm that exploits the inherent error resilience of a wide range of application domains by allowing hardware implementations to forsake exact Boolean equivalence with algorithmic specifications. A slew of manual design techniques for approximate computing have been proposed in recent years, but very little effort has been devoted to design automation. We propose SALSA, a Systematic methodology for Automatic Logic Synthesis of Approximate circuits. Given a golden RTL specification of a circuit and a quality constraint that defines the amount of error that may be introduced in the implementation, SALSA synthesizes an approximate version of the circuit that adheres to the pre-specified quality bounds. We make two key contributions: (i) the rigorous formulation of the problem of approximate logic synthesis, enabling the generation of circuits that are correct by construction, and (ii) mapping the problem of approximate synthesis into an equivalent traditional logic synthesis problem, thereby allowing the capabilities of existing synthesis tools to be fully utilized for approximate logic synthesis. In order to achieve these benefits, SALSA encodes the quality constraints using logic functions called Q-functions, and captures the flexibility that they engender as Approximation Don't Cares (ADCs), which are used for circuit simplification using traditional don't care based optimization techniques. We have implemented SALSA using two off-the-shelf logic synthesis tools - SIS and Synopsys Design Compiler. We automatically synthesize approximate circuits ranging from arithmetic building blocks (adders, multipliers, MAC) to entire datapaths (DCT, FIR, IIR, SAD, FFT Butterfly, Euclidean distance), demonstrating scalability and significant improvements in area (1.1X to 1.85X for tight error constraints, and 1.2X to 4.75X for relaxed error constraints) and power (1.15X to 1.75X for tight error constraints, and 1.3X to 5.25X for relaxed error constraints).","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127016132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Executing synchronous dataflow graphs on a SPM-based multicore architecture","authors":"Junchul Choi, Hyunok Oh, Sungchan Kim, S. Ha","doi":"10.1145/2228360.2228480","DOIUrl":"https://doi.org/10.1145/2228360.2228480","url":null,"abstract":"In this paper we are concerned about executing synchronous dataflow (SDF) applications on a multicore architecture where a core has a limited size of scratchpad memory (SPM). Unlike traditional multi-processor scheduling of SDF graphs, we consider the SPM size limitation that incurs code and data overlay overhead. Since the scheduling problem is intractable, we propose an EA(evolutionary algorithm)-based technique. To hide memory latency, prefetching is aggressively performed in the proposed technique. The experimental results show that our approach reduces the overlay overhead significantly compared to a non-optimized approach and the previous approach.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132190138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cognitive computing with spin-based neural networks","authors":"M. Sharad, C. Augustine, G. Panagopoulos, K. Roy","doi":"10.1145/2228360.2228594","DOIUrl":"https://doi.org/10.1145/2228360.2228594","url":null,"abstract":"We model a step transfer function neuron with lateral spin valve (LSV) and propose its application in low power neural network hardware. The computational task in such a network is performed by nano-magnets, metal channels and programmable conductive elements, that constitute the neuron-synapse units and operate at a terminal voltage of ~20 mV. CMOS transistors provide peripheral support in the form of clocking, power gating and inter-neuron signaling. Simulations for cognitive as well as Boolean computation applications show more than 94% improvement in power consumption as compared to a conventional CMOS design at the same technology node.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115210232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A semiempirical model for wakeup time estimation in power-gated logic clusters","authors":"Vivek D. Tovinakere, O. Sentieys, Steven Derrien","doi":"10.1145/2228360.2228371","DOIUrl":"https://doi.org/10.1145/2228360.2228371","url":null,"abstract":"Wakeup time is an important overhead that must be determined for effective power gating, particularly in logic clusters that undergo frequent mode transitions for run-time leakage power reduction. In this paper, a semiempirical model for virtual supply voltage in terms of basic parameters of the power-gated circuit is presented. Hence a closed-form expression for estimation of wakeup time of a power-gated logic cluster is derived. Experimental results of application of the model to ISCAS85 benchmark circuits show that wakeup time may be estimated within an average error of 16.3% across 22× variation in sleep transistor sizes and 13× variation in circuit sizes with significant speedup in computation time compared to SPICE level circuit simulations.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115494318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing energy efficiency of 3-D multicore systems with stacked DRAM under power and thermal constraints","authors":"Jie Meng, Katsutoshi Kawakami, A. Coskun","doi":"10.1145/2228360.2228477","DOIUrl":"https://doi.org/10.1145/2228360.2228477","url":null,"abstract":"3D multicore systems with stacked DRAM have the potential to boost system performance significantly; however, this performance increase may cause 3D systems to exceed the power budget or create thermal hot spots. This paper introduces a framework to model on-chip DRAM accesses and analyzes performance, power, and temperature tradeoffs of 3D systems. We propose a runtime optimization policy to maximize performance while maintaining power and thermal constraints. Our policy dynamically monitors workload behavior and selects among low-power and turbo operating modes accordingly. Experiments with multithreaded workloads demonstrate up to 49% energy efficiency improvements compared to existing thermal management policies.","PeriodicalId":263599,"journal":{"name":"DAC Design Automation Conference 2012","volume":"263 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123393122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}