{"title":"A novel O(1) parallel deadlock detection algorithm and architecture for multi-unit resource systems","authors":"Xiang Xiao, J. Lee","doi":"10.1109/ICCD.2007.4601942","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601942","url":null,"abstract":"This paper introduces a novel O(1) parallel deadlock detection approach for multi-unit resource system-on-a-chips (SoCs), inspired by Kimpsilas method in O(1) detection as well as Shiupsilas method in parallel processing. Our contributions are (i) the first O(1) hardware deadlock detection and (ii) O(min(m, n)) preparation, both for multi-unit resource systems, where m and n are the number of processes and resources, respectively. O(min(m, n)), previously O(m times n), is achieved by performing all the searches for sink nodes for each and every resource in parallel in hardware over a matrix representing resource allocations as well as other auxiliary matrices. Our experiments demonstrate that deadlock detection always takes two clock cycles.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"12 1","pages":"480-487"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82225130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compiler-assisted architectural support for program code integrity monitoring in application-specific instruction set processors","authors":"Hai Lin, Xuan Guan, Yunsi Fei, Z. Shi","doi":"10.1109/ICCD.2007.4601899","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601899","url":null,"abstract":"(ASIPs) are being increasingly used in mobile embedded systems, the ubiquitous networking connections have exposed these systems under various malicious security attacks, which may alter the program code running on the systems. In addition, soft errors in microprocessors can also change program code and result in system malfunction. At the instruction level, all code modifications are manifested as bit flips. In this work, we present a generalized methodology for monitoring code integrity at run-time in ASIPs, where both the instruction set architecture (ISA) and the underlying microarchitecture can be customized for a particular application domain. Based on the microoperation-based monitoring architecture that we have presented in previous work, we propose a compiler-assisted and application-controlled management approach for the monitoring architecture. Experimental results show that compared with the OS-managed scheme and other compiler-assisted schemes, our approach can detect program code integrity compromises with much less performance degradation.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"9 1","pages":"187-193"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72668289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tutorial: Software-defined radio technology","authors":"M. Cummings, T. Cooklev","doi":"10.1109/ICCD.2007.4601887","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601887","url":null,"abstract":"Software defined radio (SDR) is one of the most important emerging disruptive technologies that shaped wireless communication and mobile computing industries. The \"ideal\" software radio consists of a wideband antenna, wideband ADC and DAC, and a programmable processor. This paper discusses the development of software radios along with their applications in different fields of telecommunication.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"32 1","pages":"103-104"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75743650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Post-layout comparison of high performance 64b static adders in energy-delay space","authors":"Sheng Sun, C. Sechen","doi":"10.1109/ICCD.2007.4601931","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601931","url":null,"abstract":"Our objective was to determine the most energy efficient 64 b static CMOS adder architecture, for a range of high-performance delay targets. We examine extensively carry-lookahead (CLA) and carry-select adders with a wide range of tradeoffs in logic levels, fanouts and wiring complexity. We propose sparse CLA adder architectures based on buffering techniques to reduce logic redundancy and improve energy efficiency. All the designs were implemented using an energy-delay layout optimization flow with full RC extraction. Our new 64 b adder designs have a relative delay as low as 9.9 F04 (fanout-offour inverter) delays and promise better scaling for smaller technology nodes. They yield the best energy efficiency for a wide range of delay targets and are 30%, 15% and 7% more energy efficient than full Kogge-Stone, sparse-2 Kogge-Stone and Han-Carlson, respectively, at the fastest points. They consume only about 1/3 the energy of dynamic adders.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"8 1","pages":"401-408"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77796177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brian J. Hickmann, A. Krioukov, M. Schulte, M. A. Erle
{"title":"A parallel IEEE P754 decimal floating-point multiplier","authors":"Brian J. Hickmann, A. Krioukov, M. Schulte, M. A. Erle","doi":"10.1109/ICCD.2007.4601916","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601916","url":null,"abstract":"Decimal floating-point multiplication is important in many commercial applications including banking, tax calculation, currency conversion, and other financial areas. This paper presents a fully parallel decimal floating-point multiplier compliant with the recent draft of the IEEE P754 Standard for Floating-point Arithmetic (IEEE P754). The novelty of the design is that it is the first parallel decimal floating-point multiplier offering low latency and high throughput. This design is based on a previously published parallel fixed-point decimal multiplier which uses alternate decimal digit encodings to reduce area and delay. The fixed-point design is extended to support floating-point multiplication by adding several components including exponent generation, rounding, shifting, and exception handling. Area and delay estimates are presented that show a significant latency and throughput improvement with a substantial increase in area as compared to the only published IEEE P754 compliant sequential floating-point multiplier. To the best of our knowledge, this is the first publication to present a fully parallel decimal floating-point multiplier that complies with IEEE P754.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"50 1","pages":"296-303"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81149755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wanping Zhang, Ling Zhang, Rui Shi, He Peng, Zhi Zhu, L. Chua-Eoan, R. Murgai, Toshiyuki Shibuya, N. Ito, Chung-Kuan Cheng
{"title":"Fast power network analysis with multiple clock domains","authors":"Wanping Zhang, Ling Zhang, Rui Shi, He Peng, Zhi Zhu, L. Chua-Eoan, R. Murgai, Toshiyuki Shibuya, N. Ito, Chung-Kuan Cheng","doi":"10.1109/ICCD.2007.4601939","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601939","url":null,"abstract":"This paper proposes an efficient analysis flow and an algorithm to identify the worst case noise for power networks with multiple clock domains. First, we apply the Laplace transform on the input current sources to derive the analytical formula. Then, we calculate the circuit frequency response with logarithmic scale frequency components. The frequency domain response is approximated by a rational function using vector fitting modeling. The rational function is used to derive the natural frequency of the power ground networks, and can be converted back into time domain easily. Based on the analysis results, we then present the worst case clock gating pattern algorithm to analyze the power networks with multiple clock domains. The most expensive part of the proposed algorithm is the matrix solving: O(F(N) ldr log f ldr D). Function F is the complexity of iterative solution of complex matrix with dimension N. We assume that there are D clock domains and the frequency spans from 0 to f Hz. Experimental results show that our method is up to 60X faster than HSPICE, and can analyze large circuits which are not affordable by HSPICE.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"52 1","pages":"456-463"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81632033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Jacob, A. Zia, Okan Erdogan, P. Belemjian, Peng Jin, Jin Woo Kim, M. Chu, R. Kraft, J. McDonald
{"title":"Amdahl’s figure of merit, SiGe HBT BiCMOS, and 3D chip stacking","authors":"P. Jacob, A. Zia, Okan Erdogan, P. Belemjian, Peng Jin, Jin Woo Kim, M. Chu, R. Kraft, J. McDonald","doi":"10.1109/ICCD.2007.4601901","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601901","url":null,"abstract":"Forty years ago Gene Amdahl published a figure of merit for parallel computation, which proved extremely controversial. The controversy still rages today, although those that have looked closely at this figure of merit conclude that it is correct, but perhaps misinterpreted. In this paper we will look at a small variation on that law that suggests computer designers should take a closer look at two emerging technologies, SiGe HBT BiCMOS and 3D chip stacking. We may be overlooking a way to continue the clock race, and in so doing accomplish better parallelism.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"330 1","pages":"202-207"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76367543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A technique for selecting CMOS transistor orders","authors":"T. Chiang, C. Y. Chen, Weiyu Chen","doi":"10.1109/ICCD.2007.4601936","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601936","url":null,"abstract":"Transistor reordering has been known to be effective in reducing delays of a circuit with nearly zero penalties. However, techniques to determine good transistor orders have not been proposed in literature. Previous work on this has to resort to running SPICE for all meaningful transistor orders and selecting a best one, which is extremely time-consuming. This paper proposes an efficient and accurate technique for determining best transistor orders without running SPICE simulations. Experimental results from SPICE3 show that the predictions are very accurate.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"5 1","pages":"438-443"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84995446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting errors in a polynomial basis multiplier using multiple parity bits for both inputs","authors":"Siavash Bayat Sarmadi, M. A. Hasan","doi":"10.1109/ICCD.2007.4601926","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601926","url":null,"abstract":"This paper investigates the concurrent detection of multiple-bit errors in polynomial basis (PB) multipliers over binary extension fields. To this end, multiple parity bits are considered for both inputs of the multiplier. For the multiplier architecture considered here, the two inputs go through considerably different sets of circuits and this allows us to use different number of parity bits with the inputs. In a bit-parallel implementation of a GF(2163) PB multiplier with eight parity bits for the first input and three parity bits for the second input, the area overhead and the probability of error detection are approximately 55.59% and 0.997, respectively. Additionally, the average time overhead of the scheme implemented in a bit-parallel fashion is approximately 25%.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"1 1","pages":"368-375"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83909991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamically compressible context architecture for low power coarse-grained reconfigurable array","authors":"Yoonjin Kim, R. Mahapatra","doi":"10.1109/ICCD.2007.4601930","DOIUrl":"https://doi.org/10.1109/ICCD.2007.4601930","url":null,"abstract":"Most of the coarse-grained reconfigurable array architectures (CGRAs) are composed of reconfigurable ALU arrays and configuration cache (or context memory) to achieve high performance and flexibility. Specially, configuration cache is the main component in CGRA that provides distinct feature for dynamic reconfiguration in every cycle. However, frequent memory-read operations for dynamic reconfiguration cause much power consumption. Thus, reducing power in configuration cache has become critical for CGRA to be more competitive and reliable for its use in embedded systems. In this paper, we propose dynamically compressible context architecture for power saving in configuration cache. This power-efficient design of context architecture works without degrading the performance and flexibility of CGRA. Experimental results show that the proposed approach saves up to 39.72% power in configuration cache with negligible area overhead.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"1 1","pages":"395-400"},"PeriodicalIF":0.0,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85328436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}