MICRO 24最新文献

筛选
英文 中文
On reconfigurable on-chip data caches 关于可重构的片上数据缓存
MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123504
F. Dahlgren, P. Stenström
{"title":"On reconfigurable on-chip data caches","authors":"F. Dahlgren, P. Stenström","doi":"10.1145/123465.123504","DOIUrl":"https://doi.org/10.1145/123465.123504","url":null,"abstract":"Cache memory has shown to be the most important technique to bridge the gap between the processor speed and the memory access time. The advent of high-speed RISC and superscalar processors, however, calls for small on-chip data caches. Due to physical limitations, these should be simply designed and yet yield good performance. In this paper, we present new cache architectures that address the problems of conflict misses and non-optimal line sizes in the context of direct-mapped caches. Our cache architectures can be reconfigured by software in a way that matches the reference pattern for array data structures. We show that the implementation cost of the reconfiguration capability is neglectable. We also show simulation results !M demons tratc sign i fican t performance improvements for both methods.","PeriodicalId":118572,"journal":{"name":"MICRO 24","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117063051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
An instruction-level performance analysis of the Multiflow TRACE 14/300 Multiflow TRACE 14/300的指令级性能分析
MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123468
M. Schuette, John Paul Shen
{"title":"An instruction-level performance analysis of the Multiflow TRACE 14/300","authors":"M. Schuette, John Paul Shen","doi":"10.1145/123465.123468","DOIUrl":"https://doi.org/10.1145/123465.123468","url":null,"abstract":"Advances in compiler technology have recently led to the introduction of a new architectural paradigm, called the Very Long Instruction Word (VLIW) architecture. The Multijlow TRACE series of processors is the jirst commercial line of processors wv”th this architecture. Information on the performance of the TRACE is of sigtujicant value to the design of all processors intended to exploit jine-grain parallelism. This paper presents results concerning the performance and resource utilization of the TRACE 14/300 on a set of 11 common scienti~c programs written in both C and FORTRAN. Several characteristics of the application, architecture, implementation, and compiler that contribute to the observed results are identified. Performance of the TRACE 14/300 is also measured on several standard benchmarks, including the SPEC benchmark suite. Comparisons are made with results from other processors. The architectural effectiveness of the TRACE 141300 appears to be better than most existing RISC workstations and is comparable to the best current superscalar workstations.","PeriodicalId":118572,"journal":{"name":"MICRO 24","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122262473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Software pipelining for transport-triggered architectures 用于传输触发架构的软件流水线
MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123479
J. Hoogerbrugge, H. Corporaal, Hans M. Mulder
{"title":"Software pipelining for transport-triggered architectures","authors":"J. Hoogerbrugge, H. Corporaal, Hans M. Mulder","doi":"10.1145/123465.123479","DOIUrl":"https://doi.org/10.1145/123465.123479","url":null,"abstract":"This paper discusses software pipelining for a new class of architectures that we call transport-triggered. These architectures reduce the interconnection requirements between function units. They also exhibit code scheduling possibilities which are not available in traditional operation-triggered architectures. In addition the scheduling freedom is extended by the use of so-called hybridpipelined function utits. In order to exploit this tleedom, existing scheduling techniques need to be extended. We present a software pipelirtirtg technique, based on Lam’s algorithm, which exploits the potential of !mnsport-triggered architectures. Performance results are presented for several benchmak loops. Depending on the available transport capacity, MFLOP rates may increase significantly as compared to scheduling without the ex~a degrees of freedom. As stated in [5] transport-triggered MOVE architectures have extra irtstxuction scheduling degrees of tkeedom. This paper investigates if and how those extra degrees influence the software pipelining iteration initiation interval. It therefore adapts the existing algorithms for software pipelining as developed by Lam [2]. It is shown that transport-triggering may lead to a significant reduction of the iteration initiation interval and therefore to an increase of the MIPS and/or MFLOPS rate. The remainder of this paper starts with an introduction of the MOVE class of architectures; it clari6es the idea of transporttriggered architectures. Section 3 formulates the software pipelining problem and its algorithmic solution for trrmsport-triggered architectures. Section 4 describes the architecture characteristics and benchmarks used for the measurements. In order to research the influence of the extra scheduling freedom, the algorithm has been applied to the benchmarks under dfierent scheduling disciplines. The next section (5) compares and analysis the measurements. Finally section 6 gives severaf conclusions and indicates further research to be done.","PeriodicalId":118572,"journal":{"name":"MICRO 24","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122412714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching 具有编译器辅助数据预取的超标量处理器的数据访问微体系结构
MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123478
William Y. Chen, S. Mahlke, P. Chang, Wen-mei W. Hwu
{"title":"Data access microarchitectures for superscalar processors with compiler-assisted data prefetching","authors":"William Y. Chen, S. Mahlke, P. Chang, Wen-mei W. Hwu","doi":"10.1145/123465.123478","DOIUrl":"https://doi.org/10.1145/123465.123478","url":null,"abstract":"The performance of superscrdar processors is more sensitive to the memory system delay than their single-issue predecessors. This paper examines alternative data access microarchitectures that effectively support compilerassisted data prefetching in superscalar processors. In particular, a prefetch buffer is shown to be more effective than increasing the cache dimension in solving the cache pollution problem. All in all, we show that a small data cache with compiler-assisted data prefetching can achieve a performance level close to that of an ideal cache.","PeriodicalId":118572,"journal":{"name":"MICRO 24","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133695246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 110
An analysis of the information content of address reference streams 地址参考流的信息内容分析
MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123470
J. Becker, A. Park, M. Farrens
{"title":"An analysis of the information content of address reference streams","authors":"J. Becker, A. Park, M. Farrens","doi":"10.1145/123465.123470","DOIUrl":"https://doi.org/10.1145/123465.123470","url":null,"abstract":"We analyze the information content of several address reference streams. Our results indicate that a new scheme, based on Dynamic Huffman Coding [Vitt87], can encode a typical 32 bit address in four to seven bits. Unlike previous schemes used to estimate the information content of address words [HaDa771 ~arnm77], our scheme is completely on-line and does not rely on preeomputation of address transition probabilities. Our results imply that at least 83% of address bits in the traces we studied contain redundant information. Although our coding scheme is too complex and computationally expensive to implement in practice, it provides a lower bound on the bandwidth that can be achieved by practical compression schemes. Through use of these address compression techniques, the number of bus lines and 1/0 pins required to transmit address information between processor and memory can be ptly reduced.","PeriodicalId":118572,"journal":{"name":"MICRO 24","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114072904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Executing loops on a fine-grained MIMD architecture 在细粒度的MIMD架构上执行循环
MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123505
Sunah Lee, Rajiv Gupta
{"title":"Executing loops on a fine-grained MIMD architecture","authors":"Sunah Lee, Rajiv Gupta","doi":"10.1145/123465.123505","DOIUrl":"https://doi.org/10.1145/123465.123505","url":null,"abstract":"We present techniques for exploiting parallelism extracted from loops on an MIMD system. Parallelism is exploited through parallel execution of instructions on multiple processors as well as pipelined nature of individual processors. The processors based upon the load/store architecture read/write operands frotn/to private registers, shared registers, and channel queues. If the communication of a vahte from one processor to another requires synchronization then a channel is used otherwise a shared register is used to communicate the vahte. The reeeiving processor reads the values from a channel queue in the order they are written to the channel by the sending processor. The scheduling of operations is carried out in a manner that reduces interprocessor communication. Such schedules reduce the likelihood of one processor impeding the progress of other processors.","PeriodicalId":118572,"journal":{"name":"MICRO 24","volume":"367 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122846441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The effect of real data cache behavior on the performance of a microarchitecture that supports dynamic scheduling 真实数据缓存行为对支持动态调度的微架构性能的影响
MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123472
M. Butler, Y. Patt
{"title":"The effect of real data cache behavior on the performance of a microarchitecture that supports dynamic scheduling","authors":"M. Butler, Y. Patt","doi":"10.1145/123465.123472","DOIUrl":"https://doi.org/10.1145/123465.123472","url":null,"abstract":"Recent studies have demonstrated that significant parallelism exists in stigle instruoticm streams and can be exploited if the microarchitecture is equipped to take advantage of it. These studies, -however, have assumed optimistic memory systems, including 100 percent data cache hit rates and multiple independent cache ports. ‘There has been legitimate concern that when the optimistic memory systems are ~eplaced with realistic memory systems, much of the increase in performance will be lost. In this study we extend our previous work to investigate the effects of realistic cache characteristics on performance. We model the execution of three integer benchmarks and two floating point benchmarks from the SPEC suite for a series of machine configurations and cache models. For moderate-sized, direct mapped caches, interleaved to provide the statistical bandwidth required, we have found performance of between 2.4 and 4.6 instructions per cycle, and degradation of between 1 and 17 percent over the ideal memory system.","PeriodicalId":118572,"journal":{"name":"MICRO 24","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122519276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A quantitative analysis of locality in dataflow programs 数据流程序中局部性的定量分析
MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123469
W. M. Miller, W. Najjar, A. Böhm
{"title":"A quantitative analysis of locality in dataflow programs","authors":"W. M. Miller, W. Najjar, A. Böhm","doi":"10.1145/123465.123469","DOIUrl":"https://doi.org/10.1145/123465.123469","url":null,"abstract":"Substantial evidence suggests that exploiting some forms of locality within datajiow programs can impact performance dramatically. This is the basic premise of several hybrid von Neumann-dataflow or multithreaded architectures. Identifying and exploiting locality, however, in a jine-grained asynchronous execution model is not trivial. In this paper, jine grained intra-thread locality is defined, quantified and evaitiated. These experimental measurements are based on the evaluation of a set of numer+c and non-numeric benchmarks. The results point to a very large degree of thread locality: for example, over 70% of the instructions have to wait tess than 5 instruction execution steps for their input data. Furthermore, the remarkable uniformity and consistency of the distti”bution of thread locality across a wide vam”ety of benchmarks suggests that thread locality is highly dependent on the instruction set.","PeriodicalId":118572,"journal":{"name":"MICRO 24","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126254030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Register/ file/ cache microarchitecture study using VHDL 用VHDL研究寄存器/文件/缓存微体系结构
MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123510
Samarina Makhdoom, D. Tabak, R. Auletta
{"title":"Register/ file/ cache microarchitecture study using VHDL","authors":"Samarina Makhdoom, D. Tabak, R. Auletta","doi":"10.1145/123465.123510","DOIUrl":"https://doi.org/10.1145/123465.123510","url":null,"abstract":"The influence on the processor performance comparing the CPU register file size to on-chip cache size, in a RISC-type microprocessor is investigated using VHDL modeling. The Intel 80860(or i860) was selected as a model for this study. The Linpack benchmark was used as an example for generating performance estimates. The i860 micmarchitecture was modeled and simulated using VHDL., The i860 performance executing the Linpack benchmark was tested while modifying the size of its floating point register file (actual size: 32 32-bit, or 16 64-bit registers). The model was compiled and simulated using the Intermetrics version 3.0 VHDL toolset on a Sun-3 workstation. An instruction classification scheme, called the generic model, was developed in the course of this study. It allows rapid characterization of applications by modeling them by the distribution of instructions and their relevant properties without the need to fully specify the corresponding code or target processor architecture. The results clearly indicate a signitlcant increase in performance while executing the selected benchmark when the register file size is doubled. Further increases in the register file size result in modest increases in performance. The study also shows that in order to achieve the same performance improvement by increasing only the cache size one would have to increase the cache by more than an order of magnitude, considerably exceeding current limitations of VLSI technology.","PeriodicalId":118572,"journal":{"name":"MICRO 24","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122210165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Viewing instruction set design as an optimization problem 将指令集设计视为优化问题
MICRO 24 Pub Date : 1991-09-01 DOI: 10.1145/123465.123497
Bruce K. Holmer, A. Despain
{"title":"Viewing instruction set design as an optimization problem","authors":"Bruce K. Holmer, A. Despain","doi":"10.1145/123465.123497","DOIUrl":"https://doi.org/10.1145/123465.123497","url":null,"abstract":"This paper reviews past attempts to systematize instruction set design and offers an alternative approach. Our technique is based on compaction of microoperations to form instructions. The compaction is done in such a way as to optimize a metric which is a function of cycle count, code size, and instruction set size. To illustrate our technique, optimal instruction sets are derived for data structure creation in Prolog.","PeriodicalId":118572,"journal":{"name":"MICRO 24","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115033259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信