Yi Zou, Yici Cai, Qiang Zhou, Xianlong Hong, S. Tan, Le Kang
{"title":"Practical Implementation of Stochastic Parameterized Model Order Reduction via Hermite Polynomial Chaos","authors":"Yi Zou, Yici Cai, Qiang Zhou, Xianlong Hong, S. Tan, Le Kang","doi":"10.1109/ASPDAC.2007.358013","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.358013","url":null,"abstract":"This paper describes the stochastic model order reduction algorithm via stochastic Hermite polynomials from the practical implementation perspective. Comparing with existing work on stochastic interconnect analysis and parameterized model order reduction, we generalized the input variation representation using polynomial chaos (PC) to allow for accurate modeling of non-Gaussian input variations. We also explore the implicit system representation using sub-matrices and improved the efficiency for solving the linear equations utilizing block matrix structure of the augmented system. Experiments show that our algorithm matches with Monte Carlo methods very well while keeping the algorithm effective. And the PC representation of non-Gaussian variables gains more accuracy than Taylor representation used in previous work (Wang et al., 2004).","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123954184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Siljenberg, S. Baumgartner, T.C. Buchholtz, M. Maxson, T. Timpane, Jeff Johnson
{"title":"Xbox360 Front Side Bus - A 21.6 GB/s End-to-End Interface Design","authors":"D. Siljenberg, S. Baumgartner, T.C. Buchholtz, M. Maxson, T. Timpane, Jeff Johnson","doi":"10.1109/ASPDAC.2007.358095","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.358095","url":null,"abstract":"With a bandwidth of 21.6 GB/s, the front side bus (FSB) of the Microsoft Xbox360trade is one of the fastest, commercially available front side bus interfaces in the consumer market. This paper explains the end-to-end system approach used in designing the bus that achieved volume production ramp 18 months after design start. The 90 nm SOI-CMOS CPU and 90 nm bulk CMOS GPU designs are described. The chip carrier, circuit board, and signal integrity analyses are described. The design approach used to achieve high volume, low cost, and short development time is explained.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"327 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114004727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Technique to Reduce Peak Current and Average Power Dissipation in Scan Designs by Limited Capture","authors":"Seongmoon Wang, Wenlong Wei","doi":"10.1109/ASPDAC.2007.358089","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.358089","url":null,"abstract":"In this paper, a technique that can efficiently reduce peak and average switching activity during test application is proposed. The proposed method does not require any specific clock tree construction, special scan cells, or scan chain reordering. Test cubes generated by any combinational ATPG can be processed by the proposed method to reduce peak and average switching activity without any capture violation. Switching activity during scan shift cycles is reduced by assigning identical values to adjacent scan inputs and switching activity during capture cycles is reduced by limiting the number of scan chains that capture responses. Hardware overhead for the proposed method is negligible. The peak transition is reduced by about 40% and average number of transitions is reduced by about 56-85%. This reduction in peak and average switching activity is achieved with no decrease in fault coverage.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131842929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liang Deng, Martin D. F. Wong, Kai-Yuan Chao, Hua Xiang
{"title":"Coupling-aware Dummy Metal Insertion for Lithography","authors":"Liang Deng, Martin D. F. Wong, Kai-Yuan Chao, Hua Xiang","doi":"10.1109/ASPDAC.2007.357785","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.357785","url":null,"abstract":"As integrated circuits manufacturing technology is advancing into 65nm and 45nm nodes, extensive resolution enhancement techniques (RETs) are needed to correctly manufacture a chip design. The widely used RET called off-axis illumination (OAI) introduces forbidden pitches which lead to very complex design rules. It has been observed that imposing uniformity on layout designs can substantially improve printability under OAI. For metal layers, uniformity can be achieved simply by inserting dummy metal wire segments at all free spaces. Simulation results indeed show significant improvement in printability with such a dummy metal insertion approach. To minimize mask cost, it is advantageous to use dummy metal segments that are of the same size as regular metal wires due to their simple geometry. But these dummy wires are printable and hence increase coupling capacitances and potentially affect yield. The alternative is to use a set of parallel sub-resolution thin wires (which is not printed) to replace a printable dummy wire segment. These invisible dummy metal segments do not increase coupling capacitances but bring a higher lithography cost, which includes mask cost and RET/process expense. This paper presents a strategy for dummy metal insertion that can optimally trade off lithography cost and coupling capacitance. In particular, we present an optimal algorithm that can minimize lithography cost subject to any given coupling capacitance bound. Moreover, this dummy metal insertion achieves a highly uniform density because of the locality of coupling capacitance, which automatically ameliorates chemical mechanical polish (CMP) problem.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129401518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical Optimization Methodology for Wideband Low Noise Amplifiers","authors":"A. Nieuwoudt, T. Ragheb, Y. Massoud","doi":"10.1109/ASPDAC.2007.357794","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.357794","url":null,"abstract":"In this paper, we present a systematic synthesis methodology for fully integrated wideband low noise amplifiers that simultaneously optimizes impedance matching, noise figure, and other performance parameters. Leveraging an accurate analytical model, we hierarchically couple global optimization techniques with local convex optimization methods to efficiently locate optimal wideband LNA circuits. The results indicate that the methodology yields significant improvement in key LNA design constraints over existing methodologies while achieving up to one order of magnitude speedup in computational performance.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117315945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shelf Packing to the Design and Optimization of A Power-Aware Multi-Frequency Wrapper Architecture for Modular IP Cores","authors":"Dan Zhao, Unni Chandran, H. Fujiwara","doi":"10.1109/ASPDAC.2007.358071","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.358071","url":null,"abstract":"This paper proposes a novel power-aware multi-frequency wrapper architecture design to achieve at-speed testability. The trade-offs between power dissipation, scan time and bandwidth are well handled by gating off certain virtual cores at a time while parallelizing the remaining. A shelf packing based optimization algorithm is proposed to design and optimize the wrapper architecture while minimizing the test time under power and bandwidth constraints.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133670190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FastRoute 2.0: A High-quality and Efficient Global Router","authors":"Min Pan, C. Chu","doi":"10.1109/ASPDAC.2007.357994","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.357994","url":null,"abstract":"Because of the increasing dominance of interconnect issues in advanced IC technology, it is desirable to incorporate global routing into early design stages to get accurate interconnect information. Hence, high-quality and fast global routers are in great demand. In this paper, we propose two major techniques to improve the extremely fast global router, FastRoute (Pan and Chu, 2006) in terms of solution quality : (1) monotonic routing, (2) multi-source multi-sink maze routing. The new router is called FastRoute 2.0. Experimental results show that FastRoute 2.0 can generate high-quality routing solutions with fast runtime compared with three state-of-the-art academic global routers FastRoute, Labyrinth (Kastner et al., 2000) and Chi Dispersion router (Hadsell and Madden, 2003). On the set of benchmarks used in Pan and Chu, 2006 and Hadsell and Madden (2003), the total overflow of FastRoute 2.0 is 98, compared to 1012 (FastRoute), 2846 (Labyrinth) and 1271 (Chi Dispersion Router). The runtime of FastRoute 2.0 is 73% slower than FastRoute, but 78times and 37times faster than Labyrinth and Chi Dispersion router. The promising results make it possible to integrate global routing into early design stages. This could dramatically improve the design solution quality.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128835750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"System Co-Design and Co-Analysis Approach to Implementing the XDR Memory System of the Cell Broadband Engine Processor; Realizing 3.2 Gbps Data Rate per Memory Lane in Low Cost, High Volume Production","authors":"Wai-Yeung Yip, S. Best, W. Beyene, R. Schmitt","doi":"10.1109/ASPDAC.2007.358097","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.358097","url":null,"abstract":"This paper describes the design and analysis of the 3.2 Gbps XDRtrade memory system of the Cell Broadband Enginetrade (Cell BE) processor developed by Sony Corporation, Sony Computer Entertainment, Toshiba and IBM. A system co-design and co-analysis approach was applied where different components of the system are designed and analyzed simultaneously to allow trade-offs to be made to optimize system electrical characteristics at low overall system cost. The XDR memory interface circuit implemented in the Cell BE processor, the power delivery system design and analysis, and the interface statistical signal integrity analysis will be described to illustrate this design and analysis approach.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115861970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CLIPPER: Counter-based Low Impact Processor Power Estimation at Run-time","authors":"Jorgen Peddersen, S. Parameswaran","doi":"10.1109/ASPDAC.2007.358102","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.358102","url":null,"abstract":"Numerous dynamic power management techniques have been proposed which utilize the knowledge of processor power/energy consumption at run-time. So far, no efficient method to provide run-time power/energy data has been presented. Current measurement systems draw too much power to be used in small embedded designs and existing performance counters can not provide sufficient information for run-time optimization. This paper presents a novel methodology to solve the problem of run-time power optimization by designing a processor that estimates its own power/energy consumption. Estimation is performed by the addition of small counters that tally events which consume power. This methodology has been applied to an existing processor resulting in an average power error of 2% and energy estimation error of 1.5%. The system adds little impact to the design, with only a 4.9% increase in chip area and a 3% increase in average power consumption. A case study of an application that utilizes the processor showcases the benefits the methodology enables in dynamic power optimization.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123530386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 1Tb/s 3W Inductive-Coupling Transceiver Chip","authors":"N. Miura, T. Kuroda","doi":"10.1109/ASPDAC.2007.357798","DOIUrl":"https://doi.org/10.1109/ASPDAC.2007.357798","url":null,"abstract":"A 1Tb/s 3W inter-chip transceiver transmits clock and data by inductive coupling at a clock rate of 1GHz and data rate of 1Gb/s per channel. 1024 data transceivers are arranged with a pitch of 30 mum in a layout area of 1mm2. The total layout area including 16 clock transceivers is 2mm2 in 0.18 mum CMOS and the chip thickness is reduced to 10 mum. Simple yet accurate model of inductive coupling is utilized for transceiver design. Bi-phase modulation (BPM) is employed for the data link to improve noise immunity, reducing power in the transceiver. 4-phase time division multiplexing (TDM) reduces crosstalk and channel pitch. The BER is lower than 10-13 with 150ps timing margin.","PeriodicalId":362373,"journal":{"name":"2007 Asia and South Pacific Design Automation Conference","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127462244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}