{"title":"Flow Time Minimization under Energy Constraints","authors":"Jian-Jia Chen, K. Iwama, Tei-Wei Kuo, Hsueh-I Lu","doi":"10.1109/ASPDAC.2007.358098","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.358098","url":null,"abstract":"Power-aware and energy-efficient designs play important roles for modern hardware and software designs, especially for embedded systems. This paper targets a scheduling problem on a processor with the capability of dynamic voltage scaling (DVS), which could reduce the power consumption by slowing down the processor speed. The objective of the targeting problem is to minimize the average flow time of a set of jobs under a given energy constraint, where the flow time of a job is defined as the interval length between the arrival and the completion of the job. We consider two types of processors, which have a continuous spectrum of the available speeds or have only a finite number of discrete speeds. Two algorithms are given: (1) An algorithm is proposed to derive optimal solutions for processors with a continuous spectrum of the available speeds. (2) A greedy algorithm is designed for the derivation of optimal solutions for processors with a finite number of discrete speeds. The proposed algorithms are extended to cope with jobs with different weights for the minimization of the average weighted flow time. The proposed algorithms are also evaluated with comparisons to schedules which execute jobs at a common effective speed.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132954415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianhua Liu, Yi Zhu, Haikun Zhu, Chung-Kuan Cheng, J. Lillis
{"title":"Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space","authors":"Jianhua Liu, Yi Zhu, Haikun Zhu, Chung-Kuan Cheng, J. Lillis","doi":"10.1109/ASPDAC.2007.358053","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.358053","url":null,"abstract":"Parallel prefix adder is the most flexible and widely-used binary adder for ASIC designs. Many high-level synthesis techniques have been developed to find optimal prefix structures for specific applications. However, the gap between these techniques and back-end designs is increasingly large. In this paper, we propose an integer linear programming method to build minimal-power prefix adders within given timing and area constraints. It counts both gate and wire capacitances in the timing and power models, considers static and dynamic power consumptions, and can handle gate sizing and buffer insertion to improve the performance further. The proposed method is also adaptive for non-uniform arrival time and required time on each bit position. Therefore our method produces the optimum prefix adder for realistic constraints.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126212018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Software Technique to Improve Yield of Processor Chips in Presence of Ultra-Leaky SRAM Cells Caused by Process Variation","authors":"M. Goudarzi, T. Ishihara, H. Yasuura","doi":"10.1109/ASPDAC.2007.358100","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.358100","url":null,"abstract":"Exceptionally leaky transistors are increasingly more frequent in nano-scale technologies due to lower threshold voltage and its increased variation. Such leaky transistors may even change position with changes in the operating voltage and temperature, and hence, redundancy at circuit-level is not sufficient to tolerate such threats to yield. We show that in SRAM cells this leakage depends on the cell value and propose a first software-based runtime technique that suppresses such abnormal leakages by storing safe values in the corresponding cache lines before going to standby mode. Analysis shows the performance penalty is, in the worst case, linearly dependent to the number of so-cured cache lines while the energy saving linearly increases by the time spent in standby mode. Analysis and experimental results on commercial processors confirm that the technique is viable if the standby duration is more than a small fraction of a second.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115254934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huan-Kai Peng, Chun-Hsin Lee, Jian-Wen Chen, Tzu-Jen Lo, Y. Chang, Sheng-Tsung Hsu, Yuan-Chun Lin, P. Chao, Wei-Cheng Hung, Kai-Yuan Jan
{"title":"A Highly Integrated 8mW H.264/AVC Main Profile Real-time CIF Video Decoder on a 16MHz SoC Platform","authors":"Huan-Kai Peng, Chun-Hsin Lee, Jian-Wen Chen, Tzu-Jen Lo, Y. Chang, Sheng-Tsung Hsu, Yuan-Chun Lin, P. Chao, Wei-Cheng Hung, Kai-Yuan Jan","doi":"10.1109/ASPDAC.2007.357966","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.357966","url":null,"abstract":"We present a hardwired decoder prototype for H.264/AVC main profile video. Our design takes as its input compressed H.264/AVC bit-stream and produces as its output video frames ready for display. We wrap the decoder core with an AMBA-AHB bus interface and integrate it into a multimedia SoC platform. Several architectural innovations at both IP and system levels are proposed to achieve very high performance at very low operating frequency. Running at 16MHz, our FPGA demo system is able to real-time decode CIF (352 times 288) video at 30 frames per second. Moreover, we take system cost into consideration such that only a single external SDRAM is needed and memory traffic minimized.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125659119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Performance-Driven Topology Design Algorithm","authors":"Min Pan, C. Chu, P. Patra","doi":"10.1109/ASPDAC.2007.357993","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.357993","url":null,"abstract":"This paper presents a very efficient algorithm for performance-driven topology design for interconnects. Given a net, it first generates A-tree topology using table lookup and net-breaking. Then a performance-driven post-processing heuristic not restricting to A-tree topology improves the obtained topology by considering the sink positions, required time and load capacitance to achieve better timing. Experimental results show that our new technique can produce topologies with better timing and is hundreds of times faster than traditional approach.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121766444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimization of Arithmetic Datapaths with Finite Word-Length Operands","authors":"S. Gopalakrishnan, P. Kalla, Florian Enescu","doi":"10.1109/ASPDAC.2007.358037","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.358037","url":null,"abstract":"This paper presents an approach to area optimization of arithmetic datapaths that perform polynomial computations over bit-vectors with finite widths. Examples of such designs abound in DSP for audio, video and multimedia computations where the input and output bit-vector sizes are dictated by the desired precision. A bit-vector of size m represents integer values reduced modulo 2m(%2m). Therefore, finite word-length bit-vector arithmetic can be modeled as algebra over finite integer rings, where the bit-vector size dictates the ring cardinality. This paper demonstrates how the number-theoretic properties of finite integer rings can be exploited for optimization of bit-vector arithmetic. Along with an analytical model to estimate the implementation cost at RTL, two algorithms are presented to optimize bit-vector arithmetic. Experimental results, conducted within practical CAD settings, demonstrate significant area savings due to our approach.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"240 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133445573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Program Phase Directed Dynamic Cache Way Reconfiguration for Power Efficiency","authors":"Subhasish Banerjee, G. Surendra, S. Nandy","doi":"10.1109/ASPDAC.2007.358101","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.358101","url":null,"abstract":"Aggressive superscalar processor with deep pipeline and sophisticated speculative execution techniques is pushing the power budget to its limit. It is found that a significant portion of this power is wasted during wrong path execution and non power optimal allocation of power hungry resources. Dynamic reconfiguration of micro-architectural resources can be exploited to bring down this waste at runtime. Lack of architectural method to capture the behavior of a program at runtime makes dynamic reconfiguration a challenge. In this paper we propose a method to characterize program behavior at runtime using conflict miss pattern of a data cache, which in turn identifies different program phases in terms of cache utilization. We use this phase information to enable/disable cache ways dynamically depending on the conflict miss pattern of a program. Using a hardware tracking mechanism we ensure that the program performance (throughput in terms of IPC) does not degrade beyond a tolerable limit. Through simulation we establish that an average improvement of 32% (best case 38%) in cache power saving is achieved at the expense of less than 2% degradation in performance for SPEC-CPU and MEDIA benchmarks. The additional hardware that detects and captures the phase information is outside the critical path of the processor and does not contribute to the overall delay.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133590323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li-Chun Lin, Shih-Hao Ou, Tay-Jyi Lin, Siang-Den Deng, Chih-Wei Liu
{"title":"Single-Issue 1500MIPS Embedded DSP with Ultra Compact Codes","authors":"Li-Chun Lin, Shih-Hao Ou, Tay-Jyi Lin, Siang-Den Deng, Chih-Wei Liu","doi":"10.1109/ASPDAC.2007.357965","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.357965","url":null,"abstract":"The performance of single-issue RISC cores can be improved significantly with multi-issue architectures (i.e. superscalar or VLIW) by activating the parallel functional units concurrently. However, they suffer high complexity or huge code sizes. In this paper, we borrow some ideas from old vector machines and propose a novel DSP architecture with very compact codes. In our simulations, the DSP has comparable performance to a 5-issue VLIW core with identical computing resources. However, its code sizes are greatly reduced. The DSP core has been implemented in the TSMC 0.13 mum CMOS technology, where the operating frequency is 305MHz and the core size is 1.45 times 1.4 mm2 including 12KB on-chip memory.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"127 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134063730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Computation of Statistically Critical Sequential Paths Under Retiming","authors":"M. Ekpanyapong, Xin Zhao, S. Lim","doi":"10.1109/ASPDAC.2007.358043","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.358043","url":null,"abstract":"In this paper we present the statistical retiming-based timing analysis (SRTA) algorithm. The goal is to compute the timing slack distribution for the nodes in the timing graph and identify the statistically critical paths under retiming, which are the paths with a high probability of becoming timing-critical after retiming. SRTA enables the designers to perform circuit optimization on these paths to reduce the probability of them becoming timing bottleneck if the circuit is retimed as a post-process. We provide a comparison among static timing analysis (=STA), statistical timing analysis (=SSTA), retiming-based timing analysis (=RTA), and our statistical retiming-based timing analysis (SRTA). Our results show that the placement optimization based on SRTA achieves the best performance results.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115595440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FastPlace 3.0: A Fast Multilevel Quadratic Placement Algorithm with Placement Congestion Control","authors":"Natarajan Viswanathan, Min Pan, C. Chu","doi":"10.1109/ASPDAC.2007.357975","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.357975","url":null,"abstract":"In this paper, we present FastPlace 3.0 - an efficient and scalable multilevel quadratic placement algorithm for large-scale mixed-size designs. The main contributions of our work are: (1) A multilevel global placement framework, by incorporating a two-level clustering scheme within the flat analytical placer FastPlace (Viswanathan and Chu, 2005) and Viswanathan et al., 2006), (2) An efficient and improved iterative local refinement technique that can handle placement blockages and placement congestion constraints. (3) A congestion aware standard-cell legalization technique in the presence of blockages. On the ISPD-2005 placement benchmarks (Nam et al., 2005), our algorithm is 5.12times, 11.52times and 16.92times faster than mPL6, Capo10.2 and APlace2.0 respectively. In terms of wirelength, we are on average, 2% higher as compared to mPL6 and 9% and 3% better as compared to Capo10.2 and APlace2.0 respectively. We also achieve competitive results compared to a number of academic placers on the placement congestion constrained ISPD-2006 placement benchmarks (Nam, 2006).","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114380822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}