Proceedings of the 20th Annual International Symposium on Computer Architecture最新文献

筛选
英文 中文
Acase For Two-way Skewed-associative Caches 用于双向倾斜关联缓存的案例
Proceedings of the 20th Annual International Symposium on Computer Architecture Pub Date : 1993-05-01 DOI: 10.1109/ISCA.1993.698558
André Seznec
{"title":"Acase For Two-way Skewed-associative Caches","authors":"André Seznec","doi":"10.1109/ISCA.1993.698558","DOIUrl":"https://doi.org/10.1109/ISCA.1993.698558","url":null,"abstract":"We introduce a new organization for multi-bank cach es: the skewed-associative cache. A two-way skewed-associative cache has the same hardware complexity as a two-way set-associative cache, yet simulations show that it typically exhibits the same hit ratio as a four-way set associative cache with the same size. Then skewed-associative caches must be preferred to set-associative caches. Until the three last years external caches were used and their size could be relatively large. Previous studies have showed that, for cache sizes larger than 64 Kbyt es, direct-mapped caches exhibit hit ratios nearly as good as set-associative caches at a lower hardware cost. Moreover, the cache hit time on a direct-mapped cache may be quite smaller than the cache hit time on a set-associative cache, because optimistic use of data jlowing out from the cache is quite natural. But now, microprocessors are designed with small on-chip caches. Performance of low-end microprocessor systems highly depends on cache behavior. Simulations show that using some associativity in on-chip caches allows to boost the performance of these lowend systems. When considering optimistic use of data (or instruction) jlowing out from the cache, the cache hit time of a two-way skewed-associative (or setassociative) cache is very close to the cache hit time of a direct-mapped cache. Therefore two-way skewed associative caches represent the best tradeoff for today microprocessors with on-chip caches whose sizes are in the range of 4-8K bytes.","PeriodicalId":410022,"journal":{"name":"Proceedings of the 20th Annual International Symposium on Computer Architecture","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124761879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 278
Register Connection: A New Approach To Adding Registers Into Instruction Set Architectures 寄存器连接:在指令集体系结构中增加寄存器的一种新方法
Proceedings of the 20th Annual International Symposium on Computer Architecture Pub Date : 1993-05-01 DOI: 10.1109/ISCA.1993.698565
T. Kiyohara, S. Mahlke, William Y. Chen, Roger A. Bringmann, R. Hank, S. Anik, Wen-mei W. Hwu
{"title":"Register Connection: A New Approach To Adding Registers Into Instruction Set Architectures","authors":"T. Kiyohara, S. Mahlke, William Y. Chen, Roger A. Bringmann, R. Hank, S. Anik, Wen-mei W. Hwu","doi":"10.1109/ISCA.1993.698565","DOIUrl":"https://doi.org/10.1109/ISCA.1993.698565","url":null,"abstract":"Code optimization and scheduling for superscalar and superpipelined processors often increase the register requirement of programs. For existing instruction sets with a small to moderate number of registers, this increased register requirement can be a factor that limits the effectivess of the compiler. In this paper, we introduce a new architectural method for adding a set of extended registers into an architecture. Using a novel concept of connection, this method allows the data stored in the extended registers to be accessed by instructions that apparently reference core registers. Furthermore, we address the technical issues involved in applying the new method to an architecture: instruction set extension, procedure call convention, context switching considerations, upward compatibility, efficient implementation, compiler support, and performance. Experimental results based on a prototype compiler and execution driven simulation show that the proposed method can significantly improve the performance of superscalar processors with a small or moderate number of registers.","PeriodicalId":410022,"journal":{"name":"Proceedings of the 20th Annual International Symposium on Computer Architecture","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121612104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Architectural Support For Translation Table Management In Large Address Space Machines 大型地址空间机中转换表管理的体系结构支持
Proceedings of the 20th Annual International Symposium on Computer Architecture Pub Date : 1993-05-01 DOI: 10.1109/ISCA.1993.698544
Jerome C. Huck, Jim Hays
{"title":"Architectural Support For Translation Table Management In Large Address Space Machines","authors":"Jerome C. Huck, Jim Hays","doi":"10.1109/ISCA.1993.698544","DOIUrl":"https://doi.org/10.1109/ISCA.1993.698544","url":null,"abstract":"Virtual memory page translation tables provide mappings from virtual to physical addresses. When the hardware controlled Translation Lookaside Buffers (TLBs) do not contain a translation, these tables provide the translation. Approaches to the structure and management of these tables vary from full hardware implementations to complete software based algorithms.\u0000The size of the virtual address space used by processes is rapidly growing beyond 32 bits of address. As the utilized address space increases, new problems and issues surface. Traditional methods for managing the page translation tables are inappropriate for large address space architectures.\u0000The Hashed Page Table (HPT), described here, provides a very fast and space efficient translation table that reduces overhead by splitting TLB management responsibilities between hardware and software. Measurements demonstrate its applicability to a diverse range of operating systems and workloads and, in particular, to large virtual address space machines. In simulations of over 4 billion instructions, improvement of 5 to 10% were observed.","PeriodicalId":410022,"journal":{"name":"Proceedings of the 20th Annual International Symposium on Computer Architecture","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114690511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 144
Parity Logging Overcoming The Small Write Problem In Redundant Disk Arrays 奇偶记录克服冗余磁盘阵列中的小写问题
Daniel Stodolsky, G. Gibson, M. Holland
{"title":"Parity Logging Overcoming The Small Write Problem In Redundant Disk Arrays","authors":"Daniel Stodolsky, G. Gibson, M. Holland","doi":"10.1145/165123.165143","DOIUrl":"https://doi.org/10.1145/165123.165143","url":null,"abstract":"Parity encoded redundant disk arrays provide highly reliable, cost effective secondary storage with high performance for read accesses and large write accesses. Their performance on small writes, however, is much worse than mirrored disks—the traditional, highly reliable, but expensive organization for secondary storage. Unfortunately, small writes are a substantial portion of the I/O workload of many important, demanding applications such as on-line transaction processing. This paper presents parity logging, a novel solution to the small write problem for redundant disk arrays. Parity logging applies journalling techniques to substantially reduce the cost of small writes. We provide a detailed analysis of parity logging and competing schemes—mirroring, floating storage, and RAID level 5— and verify these models by simulation. Parity logging provides performance competitive with mirroring, the best of the alternative single failure tolerating disk array organizations. However, its overhead cost is close to the minimum offered by RAID level 5. Finally, parity logging can exploit data caching much more effectively than all three alternative approaches.","PeriodicalId":410022,"journal":{"name":"Proceedings of the 20th Annual International Symposium on Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127824114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 164
Mechanisms For Cooperative Shared Memory 协作共享内存机制
Proceedings of the 20th Annual International Symposium on Computer Architecture Pub Date : 1993-05-01 DOI: 10.1109/ISCA.1993.698554
D. Wood, S. Chandra, B. Falsafi, M. Hill, J. Larus, A. Lebeck, James C. Lewis, Shubhendu S. Mukherjee, Subbarao Palacharla, S. Reinhardt
{"title":"Mechanisms For Cooperative Shared Memory","authors":"D. Wood, S. Chandra, B. Falsafi, M. Hill, J. Larus, A. Lebeck, James C. Lewis, Shubhendu S. Mukherjee, Subbarao Palacharla, S. Reinhardt","doi":"10.1109/ISCA.1993.698554","DOIUrl":"https://doi.org/10.1109/ISCA.1993.698554","url":null,"abstract":"This paper explores the complexity of implementing directory protocols by examining their <i>mechanisms</i> primitive operations on directories, caches, and network interfaces. We compare the following protocols: <i>Dir</i><sub>1</sub><i>B</i>, <i>Dir</i><sub>4</sub><i>B</i>, <i>Dir</i><sub>4</sub><i>NB</i>, <i>Dir</i><sub>n</sub><i>NB</i>[2], <i>Dir</i><sub>1</sub><i>SW</i>[9] and an improved version of <i>Dir</i><sub>1</sub>SW (<i>Dir</i><sub>1</sub><i>SW</i><sup>+</sup>). The comparison shows that the mechanisms and mechanism sequencing of <i>Dir</i><sub>1</sub><i>SW</i> and <i>Dir</i><sub>1</sub><i>SW</i><sup>+</sup> are simpler than those for other protocols. We also compare protocol performance by running eight benchmarks on 32 processor systems. Simulations show that <i>Dir</i><sub>1</sub><i>SW</i><sup>+</sup>s performance is comparable to more complex directory protocols. The significant disparity in hardware complexity and the small difference in performance argue that <i>Dir</i><sub>1</sub><i>SW</i><sup>+</sup> may be a more effective use of resources. The small performance difference is attributable to two factors: the low degree of sharing in the benchmarks and Check- In/Check-Out (CICO) directives [9].<br> <br>","PeriodicalId":410022,"journal":{"name":"Proceedings of the 20th Annual International Symposium on Computer Architecture","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122341172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
A Comparison Of Adaptive Wormhole Routing Algorithms 自适应虫洞路由算法的比较
Proceedings of the 20th Annual International Symposium on Computer Architecture Pub Date : 1993-05-01 DOI: 10.1109/ISCA.1993.698575
R. Boppana, S. Chalasani
{"title":"A Comparison Of Adaptive Wormhole Routing Algorithms","authors":"R. Boppana, S. Chalasani","doi":"10.1109/ISCA.1993.698575","DOIUrl":"https://doi.org/10.1109/ISCA.1993.698575","url":null,"abstract":"Improvement of message latency and network utilization in torus interconnection networks by increasing adaptivity in wormhole routing algorithms is studied. A recently proposed partially adaptive algorithm and four new fully-adaptive routing algorithms are compared with the well-known e-cube algorithm for uniform, hotspot, and local traffic patterns. Our simulations indicate that the partially adaptive north-last algorithm, which causes unbalanced traffic in the network, performs worse than the nonadaptive e-cube routing algorithm for all three traffic patterns. Another result of our study is that the performance does not necessarily improve with full-adaptivity. In particular, a commonly discussed fully-adaptive routing algorithm, which uses 2n virtual channels per physical channel of a k-ary n-cube, performs worse than e-cube for uniform and hotspot traffic patterns. The other three fully-adaptive algorithms, which give priority to messages based on distances traveled, perform much better than the e-cube and partially-adaptive algorithms for all three traffic patterns. One of the conclusions of this study is that adaptivity, full or partial, is not necessarily a benefit in wormhole routing.","PeriodicalId":410022,"journal":{"name":"Proceedings of the 20th Annual International Symposium on Computer Architecture","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125610289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 184
Limitations Of Cache Prefetching On A Bus-based Multiprocessor 基于总线的多处理器缓存预取的局限性
D. Tullsen, S. Eggers
{"title":"Limitations Of Cache Prefetching On A Bus-based Multiprocessor","authors":"D. Tullsen, S. Eggers","doi":"10.1145/165123.165163","DOIUrl":"https://doi.org/10.1145/165123.165163","url":null,"abstract":"Compiler-directed cache prefetching has the potential to hide much of the high memory latency seen by current and future high-performance processors. However, prefetching is not without costs, particularly on a multiprocessor. Prefetching can negatively affect bus utilization, overall cache miss rates, memory latencies and data sharing. We simulated the effects of a particular compiler-directed prefetching algorithm, running on a bus-based multiprocesssor. We showed that, despite a high memory latency, this architecture is not very well-suited for prefetching. For several variations on the architecture, speedups for five parallel programs were no greater than 39%, and degradations were as high as 7%, when prefetching was added to the workload. We examined the sources of cache misses, in light of several different prefetching strategies, and pinpointed the causes of the performance changes. Invalidation misses pose a particular problem for current compiler-directed prefetchers. We applied two techniques that reduced their impact: a special prefetching heuristic tailored to write-shared data, and restructuring shared data to reduce false sharing, thus allowing traditional prefetching algorithms to work well.","PeriodicalId":410022,"journal":{"name":"Proceedings of the 20th Annual International Symposium on Computer Architecture","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121467658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Cache Write Policies And Performance Cache写策略和性能
Proceedings of the 20th Annual International Symposium on Computer Architecture Pub Date : 1993-05-01 DOI: 10.1109/ISCA.1993.698560
N. Jouppi
{"title":"Cache Write Policies And Performance","authors":"N. Jouppi","doi":"10.1109/ISCA.1993.698560","DOIUrl":"https://doi.org/10.1109/ISCA.1993.698560","url":null,"abstract":"This paper investigates issues involving writes and caches. First, tradeoffs on writes that miss in the cache are investigated. In particular, whether the missed cache block is fetched on a write miss, whether the missed cache block is allocated in the cache, and whether the cache line is written before hit or miss is known are considered. Depending on the combination of these polices chosen, the entire cache miss rate can vary by a factor of two on some applications. The combination of no-fetch-on-write and write-allocate can provide better performance than cache line allocation instructions. Second, tradeoffs between write-through and write-back caching when writes hit in a cache are considered. A mixture of these two alternatives, called write caching is proposed. Write caching places a small fully-associative cache behind a write-through cache. A write cache can eliminate almost as much write traffic as a write-back cache.","PeriodicalId":410022,"journal":{"name":"Proceedings of the 20th Annual International Symposium on Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122418175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 254
Odd Memory Systems May Be Quite Interesting 奇怪的记忆系统可能相当有趣
Proceedings of the 20th Annual International Symposium on Computer Architecture Pub Date : 1993-05-01 DOI: 10.1109/ISCA.1993.698574
André Seznec, J. Lenfant
{"title":"Odd Memory Systems May Be Quite Interesting","authors":"André Seznec, J. Lenfant","doi":"10.1109/ISCA.1993.698574","DOIUrl":"https://doi.org/10.1109/ISCA.1993.698574","url":null,"abstract":"Using a prime number of N of memory banks on a vector processor allows a conflict-free access for any slice of N consecutive elements of a vector stored with a stride not multiple of N.\u0000To reject the use of a prime (or odd) number N of memory banks, it is generally advanced that address computation for such a memory system would require systematic Euclidean Division by the number N. We first show that the well known Chinese Remainder Theorem allows to define a very simple mapping of data onto the memory banks for which address computation does not require any Euclidean Division.\u0000Massively parallel SIMD computers may have several thousands of processors. When the memory on such a machine is globally shared, routing vectors from memory to the processors is a major difficulty; the control for the interconnection network cannot be generally computed at execution time. When the number of memory banks and processors is a product of prime numbers, the family of permutations needed for routing vectors for memory to the processors through the interconnection network have very specific properties. The Chinese Remainder Network presented in the paper is able to execute all these permutations in a single path and may be self-routed.","PeriodicalId":410022,"journal":{"name":"Proceedings of the 20th Annual International Symposium on Computer Architecture","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131764388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
The Cedar System And An Initial Performance Study 雪松系统及其初步性能研究
D. Kuck, E. Davidson, D. Lawrie, A. Sameh, Chuanqi Zhu
{"title":"The Cedar System And An Initial Performance Study","authors":"D. Kuck, E. Davidson, D. Lawrie, A. Sameh, Chuanqi Zhu","doi":"10.1145/285930.286005","DOIUrl":"https://doi.org/10.1145/285930.286005","url":null,"abstract":"In thts paper. we give an overmew of the Cedar mutliprocessor and present recent performance results. These tnclude the performance of some computational kernels and the Perfect Benchmarks@ . We also pnsent a methodology for judging parallel system performance and apply this methodology to Cedar, Gray YMP-8, and Thinking Machines CM-S.","PeriodicalId":410022,"journal":{"name":"Proceedings of the 20th Annual International Symposium on Computer Architecture","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124871943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信