{"title":"A DyadicCluster method used for nonlinear placement","authors":"Wenchao Gao, Qiang Zhou, Xu Qian, Yici Cai","doi":"10.1109/ISQED.2012.6187527","DOIUrl":"https://doi.org/10.1109/ISQED.2012.6187527","url":null,"abstract":"The executing analytical global placers on a flat cell design might give the best placement results, but nevertheless can incur extremely long runtime, especially when the scale of netlists is increasing dramatically nowadays. And in the flatten mode experiments, the results reveal some unexpected situations like several cells with same logical connections are inseparable. Clustering offers an aviliable solution to above questions and give an attractive choice to reduce the scale and complexity of the design, at the same time improve the placement quality. In this paper, we present a new cluster technique called DyadicCluster to consider the internal and external connections between two cells and add the area constraints, so as to combine the most closely two cells, which is more quick and accurate than previous methods. DyadicCluster has been embedded into the global placement process of a nonlinear placer DCNP for large-scale designs. The DCNP runtime explicitly decreases compared to the flatten mode [1] by 40% and the quality improves by 12%. And the half-perimeter wirelength of our placer after detail placement outperforms current state-of-the-art placers Capo, FastPlace, Fengshui and mPL5-fast by 7%, 9%, 1%, and 5% respectively.","PeriodicalId":205874,"journal":{"name":"Thirteenth International Symposium on Quality Electronic Design (ISQED)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128024790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Impact of transistor aging effects on sense amplifier reliability in nano-scale CMOS","authors":"R. Menchaca, H. Mahmoodi","doi":"10.1109/ISQED.2012.6187515","DOIUrl":"https://doi.org/10.1109/ISQED.2012.6187515","url":null,"abstract":"Bias temperature instability (among other problems) is a key reliability issue with nanoscale CMOS transistors. Especially in sensitive circuits such as sense amplifiers of SRAM arrays, transistor aging may significantly increase the probability of failure. By analyzing the Current Based Sense Amplifier circuit and Voltage-Latched Sense Amplifier circuit through HSPICE simulations, we observe that under the effects of Negative Bias Temperature Instability (NBTI) aging alone, the failure probability increases for both circuits. However, under Positive Bias Temperature Instability (PBTI) only or the combined effects of both NBTI and PBTI, failure probability reduces over time.","PeriodicalId":205874,"journal":{"name":"Thirteenth International Symposium on Quality Electronic Design (ISQED)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133992151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design issues and insights of multi-fin bulk silicon FinFETs","authors":"Hsun Li, M. Chiang","doi":"10.1109/ISQED.2012.6187571","DOIUrl":"https://doi.org/10.1109/ISQED.2012.6187571","url":null,"abstract":"Multi-fin bulk silicon FinFET-based design issues and implications using 3D numerical simulation are presented for the first time. In order to gain sufficient drive current of each transistor, multi-fin layout is inevitable due to limited aspect ratio or fin height. However, how the multi-fin design impacts the circuit performance needs to be taken into account. Because of non-planar nature of the fin, conventional concept of multi-finger design in bulk CMOS technology does not apply. We found an extra leakage path underneath the fin spacing between source and drain. Such impact can be mitigated by additional substrate doping and proper gate-to-substrate isolation. Based on the proposed design window at a tight pitch control, good performance can be achieved while meeting leakage current requirement.","PeriodicalId":205874,"journal":{"name":"Thirteenth International Symposium on Quality Electronic Design (ISQED)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132289445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chip-package power delivery network resonance analysis and co-design using time and frequency domain analysis techniques","authors":"J. Watkins, Jai Pollayil, C. Chow, A. Sarkar","doi":"10.1109/ISQED.2012.6187543","DOIUrl":"https://doi.org/10.1109/ISQED.2012.6187543","url":null,"abstract":"Traditional methods of performing worst-case DC or static analysis serves limited purposes for power delivery network (PDN) validation, especially when it comes to modeling chip-package-PCB coupling or resonance behavior. These methods do not consider the inductive and capacitive elements that dominate the chip and package interaction. They also fail to capture the impact of simultaneous switching current in creating local hot-spots and global voltage rail collapse. In this study, an analysis methodology that combines the use of both time and frequency domain techniques to model the impact of Ldi/dt noise and the coupling of chip-level switching current with chip-package impedance is presented. The outlined techniques were used on a design targeting high-speed signal processing applications to identify resonance behavior of chip-package PDN systems. Simulations were performed on various configurations of the design to ensure that the proposed design changes would correct the resonance and other PDN related issues. The analysis flow, information on the various data used, run-time and performance statistics, and the results from these experiments are presented.","PeriodicalId":205874,"journal":{"name":"Thirteenth International Symposium on Quality Electronic Design (ISQED)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114030153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Avesta Sasan, H. Homayoun, K. Amiri, A. Eltawil, F. Kurdahi
{"title":"History & Variation Trained Cache (HVT-Cache): A process variation aware and fine grain voltage scalable cache with active access history monitoring","authors":"Avesta Sasan, H. Homayoun, K. Amiri, A. Eltawil, F. Kurdahi","doi":"10.1109/ISQED.2012.6187540","DOIUrl":"https://doi.org/10.1109/ISQED.2012.6187540","url":null,"abstract":"Process variability and energy consumption are the two most formidable challenges facing the semiconductor industry nowadays. To combat these challenges, we present in this paper the “History and Variation Trained-Cache” (HVT-Cache) architecture. HVT-Cache enables fine grain voltage scaling within a memory bank by taking into account both memory access pattern and process variability. The supply voltage is changed with alterations in the memory access pattern to maximize power saving, while assuring safe operation (read and write) by guarding against process variability. In a case study, SimpleScalar simulation of the proposed 32KB cache architecture reports over 40% reduction in power consumption over standard SPEC2000 integer benchmarks while incurring an area overhead below 4% and an execution time penalty smaller than 1%.","PeriodicalId":205874,"journal":{"name":"Thirteenth International Symposium on Quality Electronic Design (ISQED)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114114779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast delay estimation with buffer insertion for through-silicon-via-based 3D interconnects","authors":"Young-Joon Lee, S. Lim","doi":"10.1109/ISQED.2012.6187499","DOIUrl":"https://doi.org/10.1109/ISQED.2012.6187499","url":null,"abstract":"For successful adoption of through-silicon-via-based 3D ICs, delay estimation techniques of 3D interconnects for early design stages are required. The 3D nets may connect gates/macros placed far apart and through-silicon-vias (TSVs) have large parasitic capacitances. Thus, buffers are inserted to reduce interconnect delay. To make good decisions in early design stages, the estimation of buffered delay should be fast and reasonably accurate. However, there has been no buffered delay estimation work for 3D ICs that considers proper delay models and TSV RC parasitics. In this work, we investigate several analytical delay models for 3D net delay estimation. Then, based on analytical formula and our heuristic algorithm, we propose how to estimate the buffered delay for movable TSV cases and fixed TSV cases. The effectiveness of our delay estimation technique is demonstrated with various 3D nets. Compared with the van Ginneken buffer insertion based delay estimation, our estimation provides solutions about 750 times faster with almost the same estimated delay.","PeriodicalId":205874,"journal":{"name":"Thirteenth International Symposium on Quality Electronic Design (ISQED)","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123467348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pinaki Chakrabarti, Vikram Bhatt, D. Hill, Aiqun Cao
{"title":"Clock mesh framework","authors":"Pinaki Chakrabarti, Vikram Bhatt, D. Hill, Aiqun Cao","doi":"10.1109/ISQED.2012.6187528","DOIUrl":"https://doi.org/10.1109/ISQED.2012.6187528","url":null,"abstract":"Clock mesh network as an on-chip variation (OCV) tolerant design solution is a well known technique but is used only in high-end designs because of more resource requirements and complex synthesis techniques involved compared to traditional clock tree based solution. With shrinking technology nodes, the effects of OCV has become a major hurdle in achieving timing closure. In advanced nodes there is relatively more chip area available. Clock mesh is the preferred technology for a high fanout clock distributed over a large physical area. Mesh technologies lower the adverse impact of guard-banding, which in conventional CTS leads to lower performance. Since clock mesh is not used predominantly in mainstream designs, an automatic and robust solution is missing in existing physical design automation tools for its synthesis. The available mesh solutions are tedious and manual in general and require several atomic steps to achieve the required structure and performance. One more challenge with a manual clock mesh flow is the strict requirement of engineering expertise to design the best clock mesh configuration and to perform its analysis. In this paper we present a semi-automatic clock mesh synthesis framework which addresses the above mentioned problems. This automated solution offers a minimal number of steps with which one can design a robust clock mesh network. The framework also includes mesh planning tools which can aid the designer in choosing the best mesh configuration and thus lowering the experience requirement. Solutions to common practical issues faced such as blockages, macros, rectilinear floorplan and hierarchical design are also discussed. This framework is fully functional in a leading industry standard physical design automation tool. Across various industry standard ASIC designs, with this solution we consistently achieved skew that is less than one-third of the skew obtained by conventional clock-tree synthesis. With our clock-mesh methodology, we could also restrict on-chip skew variation to 5% compared to 20%-25% achievable in clock-tree synthesis.","PeriodicalId":205874,"journal":{"name":"Thirteenth International Symposium on Quality Electronic Design (ISQED)","volume":"328 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122328089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical power network synthesis for multiple power domain designs","authors":"Chieh-Jui Lee, Shih-Ying Liu, Chuan-Chia Huang, Hung-Ming Chen, Chang-Tzu Lin, Chia-Hsin Lee","doi":"10.1109/ISQED.2012.6187536","DOIUrl":"https://doi.org/10.1109/ISQED.2012.6187536","url":null,"abstract":"In this paper, we propose a methodology that synthesize and optimize the power network for design with multiple power domains. An architecture is presented to represent the power network with presence of sleep transistors. The power network is numerically modeled to RC network using Modified Nodal Analysis and solved using Conjugate Gradient Method. Regarding to IR drop effect mitigation, an optimization technique is proposed based on Simulated Annealing that minimize total power stripe area while satisfying a given IR drop constraint. In consideration of multiple power domains, the given power domains are represented in tree-like structure and our algorithm is recursively applied to synthesize and optimize the power network for each power domain in a hierarchical fashion. The proposed methodology is integrated to commercial design tool and experimented on real design case for evaluation. To ensure practical aspect of our approach, evaluation is performed on latest digital design commercial tool. Design data and parameters are extracted using Open Access. The result of our algorithm is fed back to latest commercial tool for final IR and EM analysis. Our algorithm is tested on both industrial testcase and academic MCNC benchmark. Comparing to conventional P/G network, using our power network synthesis can achieve 31%-35% reduction in total P/G area while satisfying maximum 10% IR-drop constraint.","PeriodicalId":205874,"journal":{"name":"Thirteenth International Symposium on Quality Electronic Design (ISQED)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115921420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Full-chip thermal analysis of 3D ICs with liquid cooling by GPU-accelerated GMRES method","authors":"Xuexin Liu, Zao Liu, S. Tan, Joseph A. Gordon","doi":"10.1109/ISQED.2012.6187484","DOIUrl":"https://doi.org/10.1109/ISQED.2012.6187484","url":null,"abstract":"Cooling and related thermal problems are the principal challenges facing 3D integrated circuits (3D-ICs). Active cooling techniques such as integrated inter-tier liquid cooling are promising alternatives for traditional fan-based cooling, which is insufficient for 3D-ICs. In this regard, fast full-chip transient thermal modeling and simulation techniques are required to design efficient and cost-effective cooling solutions for optimal performance, cost and reliability of packages and 3D ICs. In this paper, we propose an efficient finite difference based full-chip simulation algorithm for 3D-ICs using the GMRES method based on CPU platforms. Unlike existing fast thermal analysis methods, the new method starts from the physics-based heat equations to model 3D-ICs with inter-tier liquid cooling microchannels and directly solves the resulting partial differential equations using GMRES. To speedup the simulation, we further develop a preconditioned GPU-accelerated GMRES solver, GPU-GMRES, to solve the resulting thermal equations on top of some published sparse numerical routines. Experimental results show the proposed GPU-GMRES solver is up to 4.3× faster than parallel CPU-GMRES for DC analysis and 2.3× faster than parallel LU decomposition and one or two orders of magnitude faster than the single-thread CPU-GMRES for transient analysis on a number of thermal circuits and other published problems.","PeriodicalId":205874,"journal":{"name":"Thirteenth International Symposium on Quality Electronic Design (ISQED)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114960447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of low-power, scalable-throughput systems at near/sub threshold voltage","authors":"Meeta Srivastav, Michael B. Henry, L. Nazhandali","doi":"10.1109/ISQED.2012.6187556","DOIUrl":"https://doi.org/10.1109/ISQED.2012.6187556","url":null,"abstract":"Voltage scaling has been a prevalent method of saving energy for energy constrained applications. However, voltage scaling along with shrinking process technologies exacerbate process variation effects on transistor. Large variation in transistor parameters, result in high variation in performance and power across the chip. These effects if ignored at the stage of designing will result into unpredictable behavior when deployed in the actual field. In this paper, we leverage the benefits of voltage scaling methodology for obtaining energy efficiency and compensate for the loss in throughput by exploiting parallelism present in the various DSP designs. To achieve scalable throughput, we depend on both dynamic voltage scaling with a few operating voltage options and active unit scaling, where the number of active parallel units is reduced using power gating. We show that such hybrid method consumes 8%-77% less power compared to simple dynamic voltage scaling over different throughputs. We study this system architecture in two different workload environments, one static and one dynamic. In the former, the desired target throughput is predetermined and fixed and in the latter, it can be changed dynamically. We show that to achieve highest level of energy efficiency, the number of cores and the operating voltages vary widely between a base designs versus a process variation aware (PVA) design. We further show that the PVA design enjoys an average of 26.9% and 51.1% reduction in energy consumption for the static and dynamic designs respectively over six different DSP applications. This is because the base design needs to compensate for the effects of process variation as an after fact, while the PVA is able to make suitable decisions at the time of the design.","PeriodicalId":205874,"journal":{"name":"Thirteenth International Symposium on Quality Electronic Design (ISQED)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128203844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}