{"title":"Architecture-level thermal behavioral characterization for multi-core microprocessors","authors":"Duo Li, S. Tan, M. Tirumala","doi":"10.1109/ASPDAC.2008.4483994","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483994","url":null,"abstract":"In this paper, we investigate a new architecture-level thermal characterization problem from behavioral modeling perspective to address the emerging thermal related analysis and optimization problems for high-performance multi-core microprocessor design. We propose a new approach, called ThermPOF, to build the thermal behavioral models from the measured architecture thermal and power information. ThermPOF first builds the behavioral thermal model using generalized pencil-of-function (GPOF) method. And then to effectively model transient temperature changes, we proposed two new schemes to improve the GPOF. First we apply logarithmic-scale sampling instead of traditional linear sampling to better capture the temperature changing characteristics. Second, we modify the extracted thermal impulse response such that the extracted poles from GPOF are guaranteed to be stable without accuracy loss. To further reduce the model size, Krylov subspace based model order reduction is performed to reduce the order of the models in the state-space form. Experimental results on a practical quad-core microprocessor show that generated thermal behavioral models match the measured data very well.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126784115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application-specific Network-on-Chip architecture synthesis based on set partitions and Steiner Trees","authors":"Shan Yan, Bill Lin","doi":"10.1109/ASPDAC.2008.4483955","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483955","url":null,"abstract":"This paper considers the problem of synthesizing application-specific network-on-chip (NoC) architectures. We propose two heuristic algorithms called CLUSTER and DECOMPOSE that can systematically examine different set partitions of communication flows, and we propose Rectilinear-Steiner-tree (RST) based algorithms for generating an efficient network topology for each group in the partition. Different evaluation functions in fitting with the implementation backend and the corresponding implementation technology can be incorporated into our solution framework to evaluate the implementation cost of the set partitions and RST topologies generated. In particular, we experimented with an implementation cost model based on the power consumption parameters of a 70 nm process technology where leakage power is a major source of energy consumption. Experimental results on a variety of NoC benchmarks showed that our synthesis results can on average achieve a 6.92 x reduction in power consumption over the best standard mesh implementation. To further gauge the effectiveness of our heuristic algorithms, we also implemented an exact algorithm that enumerates all distinct set partitions. For the benchmarks where exact results could be obtained, our CLUSTER and DECOMPOSE algorithms on average can achieve results within 1% and 2% of exact results, with execution times all under 1 second whereas the exact algorithms took as much as 4.5 hours.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116595295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic re-coding of reference code into structured and analyzable SoC models","authors":"Pramod Chandraiah, R. Dömer","doi":"10.1109/ASPDAC.2008.4483991","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483991","url":null,"abstract":"The quality of the input system model has a direct bearing on the effectiveness of the system exploration and synthesis tools. Given a well-structured system model, tools today are effective in generating efficient implementations. However, readily available reference C codes are not conducive for system synthesis as they lack the necessary structure and analyzability needed by the design flow. Usually reference C code is manually converted into a SoC model by applying necessary transformations. The type of transformations depends on the underlying design flow and tools. Proper structural hierarchy is one essential feature needed for architectural exploration. In this paper, we provide automatic C code transformations to encapsulate functions and insert structural hierarchy to create well-structured and analyzable SoC models. Our automatic transformations, combined with interactive application of the designer's knowledge and experience, enable faster creation of structural hierarchy in C models and hence result in significant reduction of the overall design time.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"235 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116597327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heuristic power/ground network and floorplan co-design method","authors":"Xiaoyi Wang, Jin Shi, Yici Cai, Xianlong Hong","doi":"10.1109/ASPDAC.2008.4484025","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4484025","url":null,"abstract":"It's a trend to consider power supply integrity at early stage to improve the design quality. In this paper, we propose a novel algorithm to optimize floorplan together with P/G network. Compared with previous methods, our algorithm can search the floorplan space more efficiently and therefore lead to better results. Further, we also propose a smart heuristic method to build P/G mesh grid with optimized topology. Experimental results show our method can speedup the floorplanning process by about 10 times and reduce the routing area of P/G network while maintaining the floorplan quality and P/G integrity.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129055427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Block cache for embedded systems","authors":"Dominic Hillenbrand, J. Henkel","doi":"10.1109/ASPDAC.2008.4483967","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483967","url":null,"abstract":"On chip memories provide fast and energy efficient storage for code and data in comparison to caches or external memories. We present techniques and algorithms that allow for an automated use of on chip memory for code blocks of instructions which are dynamically scheduled at runtime to increase performance and reduce power consumption.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123223972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic generation of hardware dependent software for MPSoCs from abstract system specifications","authors":"G. Schirner, A. Gerstlauer, R. Dömer","doi":"10.1109/ASPDAC.2008.4483954","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483954","url":null,"abstract":"Increasing software content in embedded systems and SoCs drives the demand to automatically synthesize software binaries from abstract models. This is especially critical for Hardware dependent Software (HdS) due to the tight coupling. In this paper, we present our approach to automatically synthesize HdS from an abstract system model. We synthesize driver code, interrupt handlers and startup code. We furthermore automatically adjust the application to use RTOS services. We target traditional RTOS-based multi-tasking solutions, as well as a pure interrupt-based implementation (without any RTOS). Our experimental results show the automatic generation of final binary images for six real-life target applications and demonstrate significant productivity gains due to automation. Our HdS synthesis is an enabler for efficient MPSoC development and rapid design space exploration.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123479740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ronghua Lu, Jun Han, Xiaoyang Zeng, Qing Li, L. Mai, Jia Zhao
{"title":"A low-cost cryptographic processor for security embedded system","authors":"Ronghua Lu, Jun Han, Xiaoyang Zeng, Qing Li, L. Mai, Jia Zhao","doi":"10.1109/ASPDAC.2008.4483921","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483921","url":null,"abstract":"A low-cost cryptographic processor for security embedded system is presented in this paper. The processor, without any assistance of dedicated cryptographic coprocessors, is scalable and very efficient for popular cryptographic algorithms such as RSA/ECC, AES, Hash, etc. Based on SMIC 0.18 um standard CMOS technology, the core circuit of the test chip has only about 32 k gates, and a max frequency of 200 MHz, under which the 1024-bit RSA algorithm takes only 150 ms and the throughout of AES reaches 256 Mbits/s.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126262927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical power profile correlation for realistic thermal estimation","authors":"L. Singhal, Sejong Oh, E. Bozorgzadeh","doi":"10.1109/ASPDAC.2008.4484038","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4484038","url":null,"abstract":"At system level, the on-chip temperature depends both on power density and the thermal coupling with the neighboring regions. The problem of finding the right set of input power profile(s) for accurate temperature estimation has not been studied. Considering only average or peak power density may lead either to underestimation or overestimation of the thermal crisis, respectively. To provide more realistic temperature estimation, we propose to incorporate multiple power profiles. Using the proposed statistical methods to determine the closeness between the power profiles, we apply a clustering algorithm to identify few input power profiles. We incorporate them in a thermal-aware floorplanner and empirical results show that using the single input power profile (average or peak) leads to 37% degradation in critical wire delay and 20% degradation in wire length, compared to using the multiple input power profiles.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126531564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Total power optimization combining placement, sizing and multi-Vt through slack distribution management","authors":"T. Luo, D. Newmark, D. Pan","doi":"10.1109/ASPDAC.2008.4483973","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483973","url":null,"abstract":"Power dissipation is quickly becoming one of the most important limiters in nanometer IC design for leakage increases exponentially as the technology scaling down. However, power and timing are often conflicting objectives during optimization. In this paper, we propose a novel total power optimization flow under performance constraint. Instead of using placement, gate sizing, and multiple-Vt assignment techniques independently, we combine them together through the concept of slack distribution management to maximize the potential for power reduction. We propose to use the linear programming (LP) based placement and the geometric programming (GP) based gate sizing formulations to improve the slack distribution, which helps to maximize the total power reduction during the Vt-assignment stage. Our formulations include important practical design constraints, such as slew, noise and short circuit power, which were often ignored previously. We tested our algorithm on a set of industrial-strength manually optimized circuits from a multi-GHz 65 nm microprocessor, and obtained very promising results. To our best knowledge, this is the first work that combines placement, gate sizing and Vt swapping systematically for total power (and in particular leakage) management.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128158059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Das, Kip Killpack, Chandramouli V. Kashyap, A. Jas, H. Zhou
{"title":"Pessimism reduction in coupling-aware static timing analysis using timing and logic filtering","authors":"D. Das, Kip Killpack, Chandramouli V. Kashyap, A. Jas, H. Zhou","doi":"10.1109/ASPDAC.2008.4483999","DOIUrl":"https://doi.org/10.1109/ASPDAC.2008.4483999","url":null,"abstract":"With continued scaling of technology into nanometer regimes, the impact of coupling induced delay variations is significant. While several coupling-aware static timers have been proposed, the results are often pessimistic with many false failures. We present an integrated iterative timing filtering and logic filtering based approach to reduce pessimism. We use a realistic coupling model based on arrival times and slews and show that non-iterative pessimism reduction algorithms proposed in previous research may give potentially non- conservative timing results. On a functional block from an industrial 65nm microprocessor, our algorithm produced a maximum pessimism reduction of 11.18% of cycle time over converged timing filtering analysis that does not consider logic constraints.","PeriodicalId":277556,"journal":{"name":"2008 Asia and South Pacific Design Automation Conference","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125669155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}