{"title":"Dynamic flip-flop with improved power","authors":"N. Nedovic, V. Oklobdzija","doi":"10.1109/ICCD.2000.878303","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878303","url":null,"abstract":"An improved design of a dynamic flip-flop is presented. The proposed design overcomes the problem of the glitch at the output and improves power-delay product for about 27%, while preserving logic embedding property. This is accomplished by equalizing the t/sub pLH/ and t/sub pHL/ of the flip-flop and careful design of keeper elements in the circuit. New design introduces insignificant area increase.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"299 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114579208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast subword permutation instructions using omega and flip network stages","authors":"X. Yang, R. Lee","doi":"10.1109/ICCD.2000.878264","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878264","url":null,"abstract":"This paper proposes a new way of efficiently doing arbitrary n-bit permutations in programmable processors modeled on the theory of omega and flip networks. The new om-flip instruction we introduce can perform any permutation of n subwords in log n instructions, with the subwords ranging from half-words down to single bits. Each omflip instruction can be done in a single cycle, with very efficient hardware implementation. The omflip instruction enhances a programmable processor's capability for handling multimedia and security applications which use subword permutations extensively.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123898059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast hierarchical floorplanning with congestion and timing control","authors":"Abhishek Ranjan, K. Bazargan, M. Sarrafzadeh","doi":"10.1109/ICCD.2000.878308","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878308","url":null,"abstract":"We propose fresher looks into already existing hierarchical partitioning based floorplan design methods and their relevance in providing faster alternatives to conventional approaches. We modify the existing partitioning based floor-planner to handle congestion and timing. We also explore the applicability of traditional sizing theorem for combining two modules based on their sizes and interconnecting wirelength. The results show that our floorplanning approach can produce floorplans hundred times faster and at the same time achieving better quality (on average 20% better wirelength, better congestion and better timing optimization) than that of pure simulated annealing based floorplanner.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124487837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Architectural impact of secure socket layer on Internet servers","authors":"K. Kant, R. Iyer, P. Mohapatra","doi":"10.1109/ICCD.2000.878263","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878263","url":null,"abstract":"Secure socket layer (SSL) is the most popular protocol used in the Internet for facilitating secure communications. In this paper, we analyze the performance and architectural impact of SSL on the servers in terms of various parameters such as throughput, utilization, cache sizes, cache miss ratios, number of processors, control dependencies, file access sizes, bus transactions, network load, etc. The major conclusions from this study are as follows: The use of SSL increases computational cost of the transactions by a factor of 5-7. SSL transactions do not benefit much from a larger L2 cache, but a larger L1 cache would be helpful. A complex logic for handling control dependencies is not useful for SSL transaction as the frequency of branches is very low. Because SSL workload is highly CPU bound, it may be possible to enhance SSL performance by using a number of other architectural features as well.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115063268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A technique for identifying RTL and gate-level correspondences","authors":"S. Ravi, N. Jha, Indradeep Ghosh, V. Boppana","doi":"10.1109/ICCD.2000.878351","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878351","url":null,"abstract":"In this paper, we consider the mapping problem of identifying correspondences between a signal in a high-level specification and a net in a lower-level implementation for a given design. Conventional techniques use shared names to associate a signal with a net whenever possible. However, given that a synthesis flow may not preserve names, solutions to the above problem eventually take recourse to expensive alternatives such as formal verification. This paper provides a robust framework for identifying RTL signal to gate-level net correspondences for a given design. Our technique exploits the observation that circuit diagnosis provides a convenient means for locating faults in a gate-level network. Since our problem requires locating gate-level nets corresponding to RTL signals, we formulate the mapping problem as a query whose solution is provided by a circuit diagnosis engine. Our experimental work with industrial designs for many mapping cases shows that our solution to the mapping problem is (i) fast, and (ii) precise in identifying the gate-level equivalents (the number of nets returned by our mapping engine for a query is typically or 2 even for designs with tens of thousands of VHDL lines).","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"699 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126322300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Output prediction logic: a high-performance CMOS design technique","authors":"L. McMurchie, S. Kio, G. Yee, T. Thorp, C. Sechen","doi":"10.1109/ICCD.2000.878293","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878293","url":null,"abstract":"We present Output Prediction Logic (OPL), a technique that can be applied to conventional CMOS logic families to obtain considerable speedups. When applied to static CMOS, OPL retains the restoring character of the logic family, including its high noise margins. Speedups of 2X to 3X over (optimized) conventional static CMOS are demonstrated for a variety of circuits, ranging from chains of gates, to datapath circuits, ranging from chains of gates, to datapath circuits, and to random logic benchmarks. Such speedups are obtained using identical netlists without remapping. When applied to pseudo-nMOS and dynamic families, in combination with remapping to wide-input NORs, OPL yields speedups of 4X to 5X over static CMOS. Since OPL applied to static CMOS is faster than conventional domino logic, and since it has higher noise margins than domino logic, we believe it will scale much better than domino with future processing technologies.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129195011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power-sensitive multithreaded architecture","authors":"J. Seng, D. Tullsen, George Z. N. Cai","doi":"10.1109/ICCD.2000.878286","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878286","url":null,"abstract":"The power consumption of microprocessors is becoming increasingly important in design decisions, not only in mobile processors, but also now in high-performance processors. Power-conscious design must therefore go beyond technology and low-level design, but also change the way modern processors are architected. A multithreading processor is attractive in the context of low-power or power-constrained devices for many of the same reasons that enable its high throughput. Primarily, it supplies extra parallelism via multiple threads, allowing the processor to rely much less heavily on speculation. We show that a simultaneous multithreading processor utilizes up to 22% less energy per instruction than a single-threaded architecture. We also explore other power optimizations that are particular to multithreaded architectures, either because they are unavailable to or unreasonable for single-thread architectures.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130659267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On solving stack-based incremental satisfiability problems","authors":"Joonyoung Kim, J. Whittemore, K. Sakallah","doi":"10.1109/ICCD.2000.878311","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878311","url":null,"abstract":"Boolean satisfiability (SAT) and its application to a number of electronic design automation (EDA) problems have been the topic of extensive study over the lost couple of decades. In many cases, a set of related SAT problems need to be solved in order to obtain an answer to a given application-specific problem. Incremental satisfiability (ISAT) refers to solving a set of related SAT problems by augmenting a previously solved problem with additional constraints, thereby reusing previous decision sequences. In this paper, we present a new ISAT engine that supports both the addition and removal of constraints. This can be achieved by keeping track of the relationships between constraints. We identify and define a special type of ISAT that occurs frequently in the context of path sensitization called stack-based ISAT and define the structure of this as a problem tree. In this type of ISAT constraints are allowed to be added and removed only in last-in first-out (LIFO) order. We also introduce a solution caching mechanism to expedite the search by recording and retrieving solutions to intermediate nodes in a problem tree.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126634579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An adder using charge sharing and its application in DRAMs","authors":"Hak-soo Yu, Songjun Lee, J. Abraham","doi":"10.1109/ICCD.2000.878301","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878301","url":null,"abstract":"This paper develops a novel technique which uses charge sharing as a method to perform addition in memory arrays. DRAM cells are conventionally used as storage elements and their data read through charge sharing. In our approach, DRAM cells are used as arithmetic units, thus saving area and power consumption in system-on-silicon applications. An adder in DRAM is designed, and its HSPICE simulation results are presented to show the viability of the proposed scheme.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128943402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating signal processing and multimedia applications on SIMD, VLIW and superscalar architectures","authors":"D. Talla, L. John, V. Lapinskii, B. Evans","doi":"10.1109/ICCD.2000.878283","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878283","url":null,"abstract":"This paper aims to provide a quantitative understanding of the performance of DSP and multimedia applications on very long instruction word (VLIW), single instruction multiple data (SIMD), and superscalar processors. We evaluate the performance of the VLIW paradigm using Texas Instruments Inc.'s TMS320C62xx processor and the SIMD paradigm using Intel's Pentium II processor (with MMX) on a set of DSP and media benchmarks. Tradeoffs in superscalar performance are evaluated with a combination of measurements on Pentium II and simulation experiments on the SimpleScalar simulator. Our benchmark suite includes kernels (filtering, autocorrelation, and dot product) and applications (audio effects, G.711 speech coding, and speech compression). Optimized assembly libraries and compiler intrinsics were used to create the SIMD and VLIW code. We used the hardware performance counters on the Pentium II and the stand-alone simulator for the C62xx to obtain the execution cycle counts. In comparison to non-SIMD Pentium II performance, the SIMD version exhibits a speedup ranging from 1.0 to 5.5 while the speedup of the VLIW version ranges from 0.63 to 9.0. The benchmarks are seen to contain large amounts of available parallelism, however, most of it is inter-iteration parallelism. Out-of-order execution and branch prediction are observed to be extremely important to exploit such parallelism in media applications.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134227064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}