{"title":"DRAM-page based prediction and prefetching","authors":"Haifeng Yu, G. Kedem","doi":"10.1109/ICCD.2000.878296","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878296","url":null,"abstract":"This paper describes and evaluates DRAM-page based cache-line prediction and prefetching architecture. The scheme takes DRAM access timing into consideration in order to reduce prefetching overhead, amortizing the high cost of DRAM access by fetching two cache lines that reside on the same DRAM-page in a single access. On each DRAM access, one or two cache blocks may be prefetched. We combine three prediction mechanisms: history mechanism, stride, and one block lookahead, make them DRAM page sensitive and deploy them in an effective adaptive prefetching strategy. Our simulation shows that the prefetch mechanism can greatly improve system performance. Using a 32-KB prediction table cache, the prefetching scheme improves performance by 26%-55% on average over a baseline configuration, depending on the memory model. Moreover, the simulation shows that prefetching is more cost-effective than simply increasing L2-cache size or using a one block lookahead prefetching scheme. Simulation results also show that DRAM-page based prefetching yields higher relative performance as processors get faster, making the prefetching scheme more attractive for next generation processors.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116494515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Source-level transformations for improved formal verification","authors":"Brian D. Winters, A. Hu","doi":"10.1109/ICCD.2000.878353","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878353","url":null,"abstract":"A major obstacle to widespread acceptance of formal verification is the difficulty in using the tools effectively. Although learning the basic syntax and operation of a formal verification tool may be easy, expert users are often able to accomplish a verification task while a novice user encounters time-out or space-out attempting the same task. In this paper, we assert that often a novice user will model a system in a different manner-semantically equivalent, but less efficient for the verification tool-than an expert user would, that some of these inefficient modeling choices can be easily detected at the source-code level, and that a robust verification tool should identify these inefficiencies and optimize them, thereby helping to close the gap between novice and expert users. To test our hypothesis, we propose some possible optimizations for the Mur/spl phi/ verification system, implement the simplest of these, and compare the results on a variety of examples written by both experts and novices (the Mur/spl phi/ distribution examples, a set of cache coherence protocol models, and a portion of the IEEE 1394 Firewire protocol). The results support our assertion-a nontrivial fraction of the Mur/spl phi/ models written by novice users were significantly accelerated by the very simple optimization. Our findings strongly support further research in this area.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"585 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116547766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assignment-space exploration approach to concurrent data-path/floorplan synthesis","authors":"K. Ohashi, M. Kaneko, S. Tayu","doi":"10.1109/ICCD.2000.878310","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878310","url":null,"abstract":"As the geometrical design rules of VLSIs become finer into the order of deep sub-micron, the impact of wires to VLSI performance becomes larger relatively to the other components, and their estimation at RT-level description and performance-driven datapath synthesis need explicit connectivity information about RT-level architecture and its floorplan. In this paper, an assignment-driven approach to the datapath synthesis incorporated with one-dimensional floor planning is proposed. In our approach, scheduling and one-dimensional floorplanning, both of which are driven by iteratively generated functional unit and register assignment (binding), are performed fully concurrently. Pseudo-branch-and-bound assignment space exploration is adopted for generating assignments in this pilot system.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128892975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A methodology and tool for automated transformational high-level design space exploration","authors":"J. Gerlach, W. Rosenstiel","doi":"10.1109/ICCD.2000.878337","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878337","url":null,"abstract":"Objective of the methodology presented in this paper is to perform design space exploration on a high level of abstraction by applying high-level transformations. The paper concentrates on algorithmic approaches on controlling the iterative process of transformation selection. A novel modular algorithm for transformation control is presented and its effectiveness is experimentally validated. In combination with a large set of transformation algorithms and mechanisms for high-level estimation of transformation quality, there results a methodology for automated high-level design space exploration. All the concepts are summarized in a software tool called ExTra (Design Space Exploration Using Transformations). Finally, the results of the application of ExTra to the JPEG encoding algorithm are presented.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125736914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Kim, Jaesik Lee, K. Baek, Eric Martina, S. Kang
{"title":"High-performance, low-power skewed static logic in very deep-submicron (VDSM) technology","authors":"C. Kim, Jaesik Lee, K. Baek, Eric Martina, S. Kang","doi":"10.1109/ICCD.2000.878269","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878269","url":null,"abstract":"This paper presents S/sup 2/L, which exhibits low-power, high-speed with use of positive feedback circuits and dual Vt. Topology-dependent dual Vt approach suppresses leakage current while boosting the performance in VDSM technology. S/sup 2/L consumes less dynamic and static power compared to Monotonic Static (MS) CMOS. We present simulation results of NAND-NOR gate chains and 32-b adders to demonstrate the effectiveness of the S/sup 2/L compared to other techniques. Design automation for the proposed circuit architecture can be achieved easily due to cascading flexibility.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134270821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A direct mapping FPGA architecture for industrial process control applications","authors":"J. T. Welch, J. Carletta","doi":"10.1109/ICCD.2000.878352","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878352","url":null,"abstract":"Industrial process control is an untapped market for field programmable gate arrays (FPGAs). Programs used for industrial process control are traditionally written in a graphical language called relay ladder logic, and implemented on programmable logic controllers (PLCs). The mapping of ladder logic onto typical FPGAs is a lengthy process, and results are hard to verify. We propose an FPGA architecture implementing relay ladder logic directly. Conversion to Boolean algebra is eliminated. Technology mapping is simple and direct. Placement and routing are also considerably simpler than in the general FPGA case. The architecture scales to devices of differing sizes and resources. This paper describes the FPGA architecture, and its role in high performance industrial process control.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127261887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Processors for mobile applications","authors":"F. Koushanfar, M. Potkonjak, V. Prabhu, J. Rabaey","doi":"10.1109/ICCD.2000.878354","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878354","url":null,"abstract":"Mobile processors form a large and very fast growing segment of semiconductor market. Although they are used in a great variety of embedded systems such as personal digital organizers (PDAs), smart cards, internet appliances, laptops, smart badges, cellular phones, wearable computers, and sensor networks, they share the common need for low power, code density, security, cost sensitivity and multimedia and communication processing. The goal of this paper is to review the field of processors for mobile applications. We survey a spectrum of processors, their system software, and the accompanying hardware components. The emphasis is on classification and identification of major technology and architecture trends. Companion to this paper is a WWW page [Mob00] which provides comprehensive additional material about mobile processors.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126397765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multilevel reverse-carry adder","authors":"J. Bruguera, T. Lang","doi":"10.1109/ICCD.2000.878282","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878282","url":null,"abstract":"The multilevel reverse-carry approach has been proposed previously for fast computation of the most-significant carry of an adder. We extend this approach to generate several carries and apply it to the implementation of the complete adder. Specifically, the operands are split into blocks and each block is added to produce the sum and the sum plus one. Concurrently with these additions the multilevel reverse-carry approach is used to generate the input carries of these blocks. Finally, these carries are used to select among the sum and the sum plus one. We have evaluated the resulting architecture for a 64-bit adder, considering the load introduced by long connections, and we estimate a reduction of about 15% in the critical path delay with respect to traditional implementations of prefix-tree based adders.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121231035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low power video object motion-tracking architecture for very low bit rate online video applications","authors":"Wael Badawy, M. Bayoumi","doi":"10.1109/ICCD.2000.878333","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878333","url":null,"abstract":"This paper presents a low power VLSI architecture for video object motion-tracking that can be used in very low bit rate online video applications. Power has been reduced at both algorithmic and arithmetic levels. The video object motion-tracking architecture consists of two main parts, a mesh-based motion estimation unit and a mesh-based motion compensation unit. The mesh-based motion estimation unit implements parallel block matching motion estimation units to optimize the latency. The mesh-based motion compensation unit uses parallel multiplication-free affine core. The architecture has been prototyped and its performance measures have been evaluated. This processor can be used in online object-based video applications.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126682777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The M/spl middot/CORE/sup TM/ M340 unified cache architecture","authors":"Afzal Malik, B. Moyer, D. Čermák","doi":"10.1109/ICCD.2000.878347","DOIUrl":"https://doi.org/10.1109/ICCD.2000.878347","url":null,"abstract":"The MCORE M340 architecture was designed to target the low-power, embedded application market. Building upon the MCORE M3 core, the M340 provides enhancements through the addition of an 8 K, 4-way set-associative unified (instruction/data) cache and an on-chip Memory Management Unit (MMU) that contains a single unified 64-entry TLB capable of mapping multiple page sizes. To achieve the power and performance requirements that today's portable electronics demand the M340 provides programmable features that allow the architecture to be optimized for a given application. This paper discusses the features of the M340 cache sub-system and illustrates the power and performance improvements that can be achieved through proper configuration.","PeriodicalId":437697,"journal":{"name":"Proceedings 2000 International Conference on Computer Design","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116648169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}