[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture最新文献

筛选
英文 中文
A study of I/O behavior of Perfect benchmarks on a multiprocessor 多处理器上完美基准的I/O行为研究
A. Reddy, P. Banerjee
{"title":"A study of I/O behavior of Perfect benchmarks on a multiprocessor","authors":"A. Reddy, P. Banerjee","doi":"10.1145/325164.325157","DOIUrl":"https://doi.org/10.1145/325164.325157","url":null,"abstract":"The I/O behavior of some scientific applications, a subset of Perfect benchmarks, executing on a multiprocessor is studied. The aim of this study is to explore the various patterns of I/O access of large scientific applications, and to understand the impact of this observed behavior on the I/O subsystem architecture. I/O behavior of the program is characterized by the demands it imposes on the I/O subsystem. It is observed that implicit I/O or paging is not a major problem for the applications considered and the I/O problem is mainly manifest in the explicit I/O done in the program. Various characteristics of I/O accesses are studied, and their impact on architecture design is discussed.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122266865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
The directory-based cache coherence protocol for the DASH multiprocessor DASH多处理器基于目录的缓存一致性协议
D. Lenoski, J. Laudon, K. Gharachorloo, Anoop Gupta, J. Hennessy
{"title":"The directory-based cache coherence protocol for the DASH multiprocessor","authors":"D. Lenoski, J. Laudon, K. Gharachorloo, Anoop Gupta, J. Hennessy","doi":"10.1145/325164.325132","DOIUrl":"https://doi.org/10.1145/325164.325132","url":null,"abstract":"DASH is a scalable shared-memory multiprocessor whose architecture consists of powerful processing nodes, each with a portion of the shared-memory, connected to a scalable interconnection network. A key feature of DASH is its distributed direction-based cache coherence protocol. Unlike traditional snoopy coherence protocols, the DASH protocol does not rely on broadcast; instead it uses point-to-point messages sent between the processors and memories to keep caches consistent. Furthermore, the DASH system does not contain any single serialization or control point. While these features provide the basis for scalability, they also force a reevaluation of many fundamental issues involved in the design of a protocol. These include the issues of correctness, performance, and protocol complexity. The design of the DASH coherence protocol is presented and discussed from the viewpoint of how it addresses the above issues. Also discussed is a strategy for verifying the correctness of the protocol. A brief comparison of the protocol with the IEEE Scalable Coherent Interface protocol is made.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115896369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 736
An empirical evaluation of two memory-efficient directory methods 两种内存效率目录方法的经验评价
Brian W. O'Krafka, A. Richard Newton
{"title":"An empirical evaluation of two memory-efficient directory methods","authors":"Brian W. O'Krafka, A. Richard Newton","doi":"10.1145/325164.325130","DOIUrl":"https://doi.org/10.1145/325164.325130","url":null,"abstract":"The authors present an empirical evaluation of two memory-efficient directory methods for maintaining coherent caches in large shared-memory multiprocessors. Both directory methods are modifications of a scheme proposed by L.M. Censier and P. Feautrier (1978) that does not rely on a specific interconnection network and can be readily distributed across interleaved main memory. The schemes considered here overcome the large amount of memory required for tags in the original scheme in two different ways. In the first scheme each main memory block is sectored into sub-blocks for which the large tag overhead is shared. In the second scheme a limited number of large tags are stored in an associative cache and shared among a much larger number of main memory blocks. Simulations show that in terms of access time and network traffic both directory methods provide significant performance improvements over a memory system in which shared-writable data are not cached. The large block sizes required for the sectored scheme, however, promote sufficient false sharing for its performance to be markedly worse than when a tag cache is used.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125034042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 135
Performance comparison of load/store and symmetric instruction set architectures 加载/存储和对称指令集体系结构的性能比较
D. Alpert, A. Averbuch, O. Danieli
{"title":"Performance comparison of load/store and symmetric instruction set architectures","authors":"D. Alpert, A. Averbuch, O. Danieli","doi":"10.1145/325164.325137","DOIUrl":"https://doi.org/10.1145/325164.325137","url":null,"abstract":"Two pipeline models, one implementing a load/store architecture, the other a symmetric architecture, are compared under identical simulation environments. The symmetric architecture instructions are more powerful, but also more complex; therefore the pipeline model for the symmetric architecture contains an additional stage with an additional adder, more bypasses, and an extra port to the register file. The authors' simulations show that the path length of the load/store architecture is 1.12 longer than that of the symmetric architecture. Nevertheless, most of this advantage is lost because of various pipeline delays that reduce the speedup factor from 1.12 to 1.0375. The main delaying contribution is due to resource dependency (0.064 CPI) and control dependency (0.048 CPI).<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132051764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Performance of an OLTP application on Symmetry multiprocessor system 对称多处理器系统上OLTP应用程序的性能
S. Thakkar, Mark Sweiger
{"title":"Performance of an OLTP application on Symmetry multiprocessor system","authors":"S. Thakkar, Mark Sweiger","doi":"10.1145/325164.325149","DOIUrl":"https://doi.org/10.1145/325164.325149","url":null,"abstract":"Sequent's Symmetry series is a bus-based shared-memory multiprocessor. System performance in an OLTP (online transaction processing) relational database application was investigated using the TP1 benchmark. System performance was tested with fully cached benchmarks and with scaled benchmarks. In fully-cached tests, the entire database fits inside main memory. In scaled tests, the database is larger than available memory. In the fully-cached benchmark, performance was initially limited by bus saturation. The cause was the transfer of process context from processor to processor. This was eliminated by assigning each process to a processor. Processor affinity was combined with reductions in message passing within the database. Throughput was dramatically improved. The scaled tests were I/O bound. This bottleneck can be eliminated by connecting more disk drives or by increasing the main memory size.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126844368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
Balance in architectural design 建筑设计中的平衡
Samuel Ho, L. Snyder
{"title":"Balance in architectural design","authors":"Samuel Ho, L. Snyder","doi":"10.1145/325164.325156","DOIUrl":"https://doi.org/10.1145/325164.325156","url":null,"abstract":"A performance metric, normalized time, which is closely related to such measures as the area-time product of VLSI theory and the price/performance ratio of advertising literature is introduced. This metric captures the idea of a piece of hardware 'pulling its own weight', that is, contributing as much to performance as it costs in resources. The authors prove general theorems for stating when the size of a given part is in balance with its utilization and give specific formulas for commonly found linear and quadratic devices. They also apply these formulas to an analysis of a specific processor element and discuss the implications for bit-serial-versus-word-parallel, RISC-versus-CISC (reduced-versus complex-instruction-set-computer), and VLIW (very-long-instruction-word) designs.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128085581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Performance measurement and trace driven simulation of parallel CAD and numeric applications on a hypercube multicomputer 在超立方体多计算机上并行CAD和数值应用程序的性能测量和跟踪驱动仿真
Jiun-Ming Hsu, P. Banerjee
{"title":"Performance measurement and trace driven simulation of parallel CAD and numeric applications on a hypercube multicomputer","authors":"Jiun-Ming Hsu, P. Banerjee","doi":"10.1145/325164.325152","DOIUrl":"https://doi.org/10.1145/325164.325152","url":null,"abstract":"The performance evaluation, workload characterization, and trace-driven simulation of a hypercube multicomputer running realistic workloads are presented. Six representative parallel applications were selected as benchmarks. Software monitoring techniques were then used to collect execution traces. On the basis of the measurement results, the authors investigated both the computation and communication behavior of these parallel programs, including CPU utilization, computation task granularity, message interarrival distribution, the distribution of waiting times in receiving messages, and message length and destination distributions. The localities in communication were also studied. A trace-driven simulation environment was developed to study the behavior of the communication hardware under real workloads. Simulation results on DMA and link utilizations are reported.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132623794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
Supporting systolic and memory communication in iWarp 支持收缩和内存通信在iWarp
S. Borkar, R. Cohn, G. Cox, T. Gross, H. T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, J. Webb
{"title":"Supporting systolic and memory communication in iWarp","authors":"S. Borkar, R. Cohn, G. Cox, T. Gross, H. T. Kung, M. Lam, M. Levine, B. Moore, W. Moore, C. Peterson, J. Susman, J. Sutton, J. Urbanski, J. Webb","doi":"10.1145/325164.325116","DOIUrl":"https://doi.org/10.1145/325164.325116","url":null,"abstract":"The iWarp communication system supports two widely used interprocessor communication styles: memory communication and systolic communication. A description is given of the rationale, architecture, and implementation for the iWarp communication system. Memory communication is flexible and well suited for general computing, whereas systolic communication is efficient and well suited for speed-critical applications. The iWarp design is made possible by two important innovations in communication: (1) program access to communication and (2) logical channels. The former allows programs to access data as they are transmitted and to redirect portions of messages to different destinations efficiently. The latter increases the connectivity between the processors and guarantees communication bandwidth for classes of messages. These innovations have provided a focus for the iWarp architecture. The result is a communication system that provides a total bandwidth of 320 MBytes/sec and that is integrated on a single VLSI component with a 20 MFLOPS plus 20 MIPS long instruction work computation engine.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127090417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 210
Virtual-channel flow control 虚拟通道流量控制
W. Dally
{"title":"Virtual-channel flow control","authors":"W. Dally","doi":"10.1145/325164.325115","DOIUrl":"https://doi.org/10.1145/325164.325115","url":null,"abstract":"Network throughput can be increased by dividing the buffer storage associated with each network channel into several virtual channels. Each physical channel is associated with several small queues, virtual channels, rather than a single deep queue. The virtual channels associated with one physical channel are allocated independently but compete with each other for physical bandwidth. Virtual channels decouple buffer resources from transmission resources. This decoupling allows active messages to pass blocked messages using network bandwidth that would otherwise be left idle. Simulation studies show that, given a fixed amount of buffer storage per link, virtual-channel flow control increases throughput by a factor of 3.5, approaching the capacity of the network.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128095396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1658
Weak ordering-a new definition 弱有序——一个新的定义
S. Adve, M. Hill
{"title":"Weak ordering-a new definition","authors":"S. Adve, M. Hill","doi":"10.1145/285930.285996","DOIUrl":"https://doi.org/10.1145/285930.285996","url":null,"abstract":"A memory model for a shared-memory multiprocessor commonly and often implicitly assumed by programmers is that of sequential consistency, which guarantees that all memory accesses will appear to execute atomically and in program order. An alternative model, weak ordering, offers greater performance potential. The central hypothesis of this work is that programmers prefer to reason about sequentially consistent memory, rather than have to think about weaker memory, or even write buffers. Following this hypothesis, weak ordering is defined as a contract between software and hardware. By this contract, software agrees to some formally specified constraints, and hardware agrees to appear sequentially consistent, at least to the software that obeys those constraints. The authors illustrate the power of the new definition with a set of software constraints that forbid data races and with an implementation for cache-coherent systems that is not allowed by the old definition.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115173988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 178
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信