Lei Zhao, D. Ikebuchi, Yoshiki Saito, M. Kamata, N. Seki, Y. Kojima, H. Amano, S. Koyama, T. Hashida, Y. Umahashi, D. Masuda, K. Usami, K. Kimura, M. Namiki, S. Takeda, Hiroshi Nakamura, Masaaki Kondo
{"title":"Geyser-2: The second prototype CPU with fine-grained run-time power gating","authors":"Lei Zhao, D. Ikebuchi, Yoshiki Saito, M. Kamata, N. Seki, Y. Kojima, H. Amano, S. Koyama, T. Hashida, Y. Umahashi, D. Masuda, K. Usami, K. Kimura, M. Namiki, S. Takeda, Hiroshi Nakamura, Masaaki Kondo","doi":"10.1109/ASPDAC.2011.5722310","DOIUrl":"https://doi.org/10.1109/ASPDAC.2011.5722310","url":null,"abstract":"Geyser-2 is the second prototype MIPS CPU which provides a fine-grained run-time power gating (PG) controlled by instructions. Geyser-1[1], the first prototype only provides the fine-grained run-time PG core. Although it demonstrated the leakage power reduction on a real chip, the operational frequency is limited at 60MHz because of the limitation of the I/O speed. Geyser-2 with cache and TLB mechanism is implemented to show (1) run-time PG works at least with 200MHz which is commonly used clock for embedded systems, and (2) it is also efficient on the environment with real application programs with an operating system.","PeriodicalId":316253,"journal":{"name":"16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128211706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Jitter amplifier for oscillator-based true random number generator","authors":"Takehiko Amaki, M. Hashimoto, T. Onoye","doi":"10.1587/TRANSFUN.E96.A.684","DOIUrl":"https://doi.org/10.1587/TRANSFUN.E96.A.684","url":null,"abstract":"This paper presents a jitter amplifier for oscillator-based TRNG (true random number generator). The proposed jitter amplifier fabricated in a 65nm CMOS process occupying the area of 3,300 μm2 archives 8.4× gain at 25°C and significantly improves the entropy enough to pass randomness test.","PeriodicalId":316253,"journal":{"name":"16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129414502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TurboVG: A HW/SW co-designed multi-core OpenVG accelerator for vector graphics applications with embedded power profiler","authors":"Shuo-Hung Chen, Hsiao-Mei Lin, Ching-Chou Hsieh, Chih-Tsun Huang, J. Liou, Yeh-Ching Chung","doi":"10.1109/ASPDAC.2011.5722315","DOIUrl":"https://doi.org/10.1109/ASPDAC.2011.5722315","url":null,"abstract":"TurboVG is a hardware accelerator for the OpenVG 1.1 library that operates sixteen times faster than an optimized software implementation. This improved efficiency stems from a well-designed hardware-software interaction capable of handling massive data transfers across hierarchical layers without performance loss. By combining multiple TurboVG cores, the library can support screen resolutions of up to Full-HD 1080p.","PeriodicalId":316253,"journal":{"name":"16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130069946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient multi-layer obstacle-avoiding preferred direction rectilinear Steiner tree construction","authors":"Jia-Ru Chuang, Jai-Ming Lin","doi":"10.1109/ASPDAC.2011.5722246","DOIUrl":"https://doi.org/10.1109/ASPDAC.2011.5722246","url":null,"abstract":"Constructing rectilinear Steiner trees for signal nets is a very important procedure for placement and routing because we can use it to find topologies of nets and measure the design quality. However, in modern VLSI designs, pins are located in multiple routing layers, each routing layer has its own preferred direction, and there exist numerous routing obstacles incurred from IP blocks, power networks, pre-routed nets, etc, which make us need to consider multilayer obstacle-avoiding preferred direction rectilinear Steiner minimal tree (ML-OAPDRSMT) problem. This significantly increases the complexity of the problem, and an efficient and effective algorithm to deal with the problem is desired. In this paper, we propose a very simple and effective approach to deal with ML-OAPDRSMT problem. Unlike previous works usually build a spanning graph and find a spanning tree to deal with this problem, which takes a lot of time, we first determine a connection ordering for all pins, and then iteratively connect every two neighboring pins by a greedy heuristic algorithm. The experimental results show that our method has average 5.78% improvement over [7] and at least five times speed up comparing with their approach.","PeriodicalId":316253,"journal":{"name":"16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114963641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EUV lithography: Prospects and challenges","authors":"S. Sivakumar","doi":"10.1109/ASPDAC.2011.5722221","DOIUrl":"https://doi.org/10.1109/ASPDAC.2011.5722221","url":null,"abstract":"Integrated circuit scaling as codified in Moore's Law has been enabled through the tremendous advances in lithographic patterning technology over multiple process generations. Optical lithography has been the mainstay of patterning technology to date. Its imminent demise has been oft proclaimed over the years but clever engineering has consistently been able to extend it through many lens size and wavelength changes. NA has increased steadily from about 0.3 to 1.35 today with improvements in lens design and the use of immersion lithography. Simultaneously, the illumination wavelength has been reduced from 436nm about 20 years ago to 193nm for state-of-the-art scanners today. However, this approach has reached its limits. The 22nm technology node, targeted for HVM in 2011, represents the last instance of using standard 1.35NA immersion lithography-based patterning for the critical layers with a k¡ hovering right around the 0.3 value that is considered acceptable for manufacturability For the 14nm node with a HVM date of 2013, one has to resort to double patterning to achieve a manufacturable. k1 For the 10nm technology node with a 2015 HVM date, double patterning will also be insufficient. While further ArF extension schemes are being considered, the industry is working towards lowering the wavelength from 193nm to Extreme Ultraviolet Lithography with a λ of 13.5nm. EUV offers the prospect of operating at significantly higher k¡ and as a consequence, much simpler design rules and potentially simpler OPC. However, the technical challenges are formidable. EUV lithography requires the re-engineering of every subsystem in the optical path — source, collector and projection optics, reticles and photoresists. A huge industry-wide effort is under way to solve these technical issues and bring 13.5nm EUV lithography to production. Two main approaches are being considered for EUV sources — Laser Produced Plasma (LPP) and Discharge Produced Plasma (DPP). Both approaches appear to be heading towards production and it remains to be seen if one approach is more scalable to higher power levels. Currently however, neither approach is close to the power levels required to deliver runrates that will have a reasonable Cost of Ownership (COO). Clearly, a lot of development is ahead to make this happen. Photoresists have also seen a significant amount of technical development, primarily using small field Micro Exposure Tools (MET). Photoresist companies are working on developing the chemical platforms needed for EUV photoresists. While much progress has been made on photospeed, resolution and linewidth roughness, further improvements are required to meet the needs of the 14nm and 10nm process nodes. Since EUV employs reflective optics, EUV reticles are reflective as well and this poses several challenges. Apart from the obvious complexities of EUV reticle manufacturing, defectivity is a major concern, both from the standpoint of making defect-free masks as well as fro","PeriodicalId":316253,"journal":{"name":"16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116059144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongbo Zhang, Yuelin Du, Martin D. F. Wong, Kai-Yuan Chao
{"title":"Mask cost reduction with circuit performance consideration for self-aligned double patterning","authors":"Hongbo Zhang, Yuelin Du, Martin D. F. Wong, Kai-Yuan Chao","doi":"10.1109/ASPDAC.2011.5722296","DOIUrl":"https://doi.org/10.1109/ASPDAC.2011.5722296","url":null,"abstract":"Double patterning lithography (DPL) is the enabling technology for printing in sub-32nm nodes. In the EDA literature, researchers have been focusing on double-exposure double-patterning (DEDP) DPL for printing arbitrary 2D features where the layout decomposition problem for double exposure is an interesting graph coloring problem. But due to overlay errors, it is very difficult for DEDP to print even 1D features. A more promising DPL technology is self-aligned double patterning (SADP) for 1D design. SADP first prints dense lines and then trims away the portions not on the design by a cut mask. The complexity of cut mask is very high, adding to the skyrocketing manufacturing cost. In this paper we present a mask cost reduction method with circuit performance consideration for SADP. This is the first paper to focus on the mask cost reduction issue for SADP from a design perspective. We simplify the polygons on the cut mask, by formulating the problem as a constrained shortest path problem. Experimental results show that with a set of layouts in 28nm technology, we can largely reduce the complexity of cut polygons, with little impact on performance.","PeriodicalId":316253,"journal":{"name":"16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128681569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Run-time adaptable on-chip thermal triggers","authors":"Pratyush Kumar, David Atienza Alonso","doi":"10.1109/ASPDAC.2011.5722194","DOIUrl":"https://doi.org/10.1109/ASPDAC.2011.5722194","url":null,"abstract":"With ever-increasing power densities, Dynamic Thermal Management (DTM) techniques have become mainstream in today's systems. An important component of such techniques is the thermal trigger. It has been shown that predictive thermal triggers can outperform reactive ones [4]. In this paper, we present a novel trade-off space of predictive thermal triggers, and compare different approaches proposed in the literature. We argue that run-time adaptability is a crucial parameter of interest. We present a run-time adaptable thermal simulator compatible with arbitrary sensor configuration based on the Neural Network (NN) simulator presented in [14]. We present experimental results on Niagara UltraSPARC T1 chip with real-life benchmark applications. Our results quantitatively establish the effectiveness of the proposed simulator for reducing (by up to 90%), the otherwise unacceptably high errors, that can arise due to expected leakage current variation and design-time thermal modeling errors.","PeriodicalId":316253,"journal":{"name":"16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132798547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 80–400 MHz 74 dB-DR Gm-C low-pass filter with a unique auto-tuning system","authors":"Ting Gao, Wei Li, Ning Li, Junyan Ren","doi":"10.1109/ASPDAC.2011.5722165","DOIUrl":"https://doi.org/10.1109/ASPDAC.2011.5722165","url":null,"abstract":"A CMOS 80–400 MHz 5TH order Chebyshev Gm-C low-pass filter with a unique auto tuning system is presented. The filter was designed and fabricated with TSMC 0.13-μm RF CMOS process. Experimental results show that the cut-off frequency of the filter can be tuned between 80–400 MHz, with a tuning step less than 7MHz and an average tuning error of 3.6%. The filter also realizes gain of 0–30 dB, IIP3 of 16.5 dBm and NF of 14–18 dB. Dynamic range of the filter for THD less than 1% is 74 dB. Power dissipation is only 9 mW with 1.2 V supply voltage.","PeriodicalId":316253,"journal":{"name":"16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121804222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AdaMS: Adaptive MLC/SLC phase-change memory design for file storage","authors":"Xiangyu Dong, Yuan Xie","doi":"10.1109/ASPDAC.2011.5722206","DOIUrl":"https://doi.org/10.1109/ASPDAC.2011.5722206","url":null,"abstract":"Phase-change memory (PCM) is an emerging memory technology that has made rapid progress in the recent years, and surpasses other technologies such as FeRAM and MRAM in terms of scalability. Recently, the feasibility of multi-level cell (MLC) for PCM, which enables a cell to store more than one bit of digital data, has also been shown. This new property makes PCM more competitive and considered as the successor of the NAND flash technology, which also has the MLC capability but does not have an easy scaling path to reach higher densities. However, the MLC capability of PCM comes with the penalty of longer programming time and shortened cell lifetime compared to its single-level cell (SLC) mode. Therefore, it suggests an adaptive MLC/SLC reconfigurable PCM design that can exploit the fast SLC access speed and the large MLC capacity with the awareness of workload characteristics and lifetime requirements. In this work, a circuit-level adaptive MLC/SLC PCM array is designed at first, the management policy of MLC/SLC mode is proposed, and finally the performance and lifetime of a novel PCM-based SSD with run-time MLC/SLC reconfiguration ability is evaluated1.","PeriodicalId":316253,"journal":{"name":"16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121994975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A practical method for multi-domain clock skew optimization","authors":"Yanling Zhi, H. Zhou, Xuan Zeng","doi":"10.1109/ASPDAC.2011.5722245","DOIUrl":"https://doi.org/10.1109/ASPDAC.2011.5722245","url":null,"abstract":"Clock skew scheduling is an effective technique in performance optimization of sequential circuits. However, with process variations, it becomes more difficult to reliably implement a wide spectrum of clock delays at the registers. Multidomain clock skew scheduling is a good option to overcome this limitation. In this paper, we propose a practical method to efficiently and optimally solve this problem. A framework based on branch-and-bound is carefully designed to search for the optimal clocking domain assignment, and a greedy clustering algorithm is developed to quickly estimate the upper bound of cycle period for a given branch. Experiment results on ISCAS89 sequential benchmarks show both the optimality and efficiency of our method compared with previous works.","PeriodicalId":316253,"journal":{"name":"16th Asia and South Pacific Design Automation Conference (ASP-DAC 2011)","volume":"17 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125940781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}