[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture最新文献

筛选
英文 中文
Fast Prolog with an extended general purpose architecture 具有扩展的通用架构的快速Prolog
Bruce K. Holmer, B. Sano, M. Carlton, P. V. Roy, R. Haygood, W. Bush, A. Despain, J. Pendleton, T. Dobry
{"title":"Fast Prolog with an extended general purpose architecture","authors":"Bruce K. Holmer, B. Sano, M. Carlton, P. V. Roy, R. Haygood, W. Bush, A. Despain, J. Pendleton, T. Dobry","doi":"10.1145/325164.325154","DOIUrl":"https://doi.org/10.1145/325164.325154","url":null,"abstract":"Most Prolog machines have been based on specialized architectures. The authors' goal is to start with a general-purpose architecture and determine a minimal set of extensions for high-performance Prolog execution. They have developed both the architecture and optimizing compiler simultaneously, drawing on results of previous implementations. They find that most Prolog-specific operations can be done satisfactorily in software; however, there is a crucial set of features that the architecture must support to achieve the best Prolog performance. The emphasis in this study is on the authors' architecture and instruction set. The costs and benefits of the special architectural features and instructions are analyzed. Simulated performance results are presented and indicate a peak compiled Prolog performance of 3.68 million logical inferences per second.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126549216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Monsoon: an explicit token-store architecture Monsoon:一个显式的令牌存储架构
G. Papadopoulos, D. Culler
{"title":"Monsoon: an explicit token-store architecture","authors":"G. Papadopoulos, D. Culler","doi":"10.1145/285930.285999","DOIUrl":"https://doi.org/10.1145/285930.285999","url":null,"abstract":"Data-flow architectures tolerate long unpredictable communication delays and support generation and coordination of parallel activities directly in hardware, instead of assuming that program mapping will cause these issues to disappear. However, the proposed mechanisms are complex and introduce new mapping complications. A greatly simplified approach to data-flow execution, called the explicit token store (ETS) architecture, and its current realization in Monsoon are presented. The essence of dynamic data-flow execution is captured by a simple transition on state bits associated with storage local to a processor. Low-level storage management is performed by the compiler in assigning nodes to slots in an activation frame, rather than dynamically in hardware. The processor is simple, highly pipelined, and quite general. It may be viewed as a generalization of a fairly primitive von Neumann architecture. Although the addressing capability is restrictive, there is exactly one instruction executed for each action on the data-flow graph. Thus, the machine-originated ETS model provides new understanding of the merits and the real cost of direct execution of data-flow graphs.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116666173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Generation and analysis of very long address traces 生成和分析非常长的地址轨迹
A. Borg, R. Kessler, D. W. Wall
{"title":"Generation and analysis of very long address traces","authors":"A. Borg, R. Kessler, D. W. Wall","doi":"10.1145/325164.325153","DOIUrl":"https://doi.org/10.1145/325164.325153","url":null,"abstract":"Existing methods of generating and analyzing traces suffer from a variety of limitations, including complexity, inaccuracy, short length, inflexibility, or applicability only to CISC (complex-instruction-set-computer) machines. The authors use a trace-generation mechanism based on link-time code modification which is simple to use, generates accurate long traces of multiuser programs, runs on a RISC (reduced-instruction-set-computer) machine, and can be flexibly controlled. Accurate performance data for large second-level caches can be obtained by on-the-fly analysis of the traces. A comparison is made of the performance of systems with 512 K to 16 M second-level caches, and it is show that, for today's large programs, second-level caches of more than 4 MB may be unnecessary. It is also shown that set associativity in second-level caches of more than 1 MB does not significantly improve system performance. In addition, the experiments provide insights into first-level and second-level cache line size.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123426876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 137
A distributed I/O architecture for HARTS 用于hart的分布式I/O体系结构
K. Shin, G. Dykema
{"title":"A distributed I/O architecture for HARTS","authors":"K. Shin, G. Dykema","doi":"10.1145/325164.325159","DOIUrl":"https://doi.org/10.1145/325164.325159","url":null,"abstract":"The issue of I/O device access in HARTS (Hexagonal Architecture for Real-Time Systems)-a distributed real-time computer system under construction at the University of Michigan-is explicitly addressed. Several candidate solutions are introduced, explored and evaluated according to cost, complexity, reliability, and performance: (1) 'node-direct' distribution with the intranode bus and a local I/O bus; (2) use of dedicated I/O nodes, which are placed in the hexagonal mesh as regular applications nodes, but which provide I/O services rather than computing services; and (3) use of a separate I/O network; which has led to the proposal of an 'interlaced' I/O network. The interlaced I/O network is intended to provide both high performance without burdening node processors with I/O overhead and a high degree of reliability. Both static and dynamic multiownership protocols are developed for managing I/O device access in this I/O network. The relative merits of the two protocols are explored, and the performance and accessibility which each provides are simulated.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130712843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Multiple instruction issue in the NonStop Cyclone processor 多指令问题在直达旋风处理器
R. Horst, R. L. Harris, Robert L. Jardine
{"title":"Multiple instruction issue in the NonStop Cyclone processor","authors":"R. Horst, R. L. Harris, Robert L. Jardine","doi":"10.1145/325164.325147","DOIUrl":"https://doi.org/10.1145/325164.325147","url":null,"abstract":"The architecture for issuing multiple instructions per clock in the NonStop Cyclone processor is described. Pairs of instructions are fetched and decoded by a dual two-stage prefetch pipeline and passed to a dual six-stage pipeline for execution. Dynamic branch prediction is used to reduce branch penalties. A unique microcode routine for each pair is stored in the large duplexed control store. The microcode controls parallel data paths optimized for executing the most frequent instruction pairs. Other features of the architecture include cache support for unaligned double-precision accesses, a virtually addressed main memory, and a novel precise exception mechanism.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134083732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
Reducing the cost of branches by using registers 通过使用寄存器来减少分支的开销
J. Davidson, D. Whalley
{"title":"Reducing the cost of branches by using registers","authors":"J. Davidson, D. Whalley","doi":"10.1145/325164.325138","DOIUrl":"https://doi.org/10.1145/325164.325138","url":null,"abstract":"In an attempt to reduce the number of operand memory references, many RISC (reduced-instruction-set-computer) machines have 32 or more general-purpose registers (e.g. MIPS, ARM, Spectrum, 88 K). Without special compiler optimizations, such as inlining or interprocedural register allocation, it is rare that a computer will use a majority of these registers for a function. The authors explore the possibility of using some of these registers to hold branch target addresses and the corresponding instruction at each branch target. To evaluate the effectiveness of this scheme, two machines were designed and emulated. One machine had 32 general-purpose registers used for data references, while the other machine had 16 data registers and 16 registers used for branching. The results show that using registers for branching can effectively reduce the cost of transfers of control.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"2010 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125609027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Synchronization with multiprocessor caches 与多处理器缓存同步
Joonwon Lee, U. Ramachandran
{"title":"Synchronization with multiprocessor caches","authors":"Joonwon Lee, U. Ramachandran","doi":"10.1145/325164.325107","DOIUrl":"https://doi.org/10.1145/325164.325107","url":null,"abstract":"A new lock-based cache scheme which incorporates synchronization into the cache coherency mechanism is presented. With this scheme high-level synchronization primitives, as well as low-level ones, can be implemented without excessive overhead. Cost functions for well-known synchronization methods are derived for invalidation schemes, write update schemes, and the authors' lock-based scheme. To predict the performance implications of the new scheme accurately, a new simulation model embodying a widely accepted paradigm of parallel programming is developed. It is shown that that authors' lock-based protocol outperforms existing cache protocols.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130595396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
The K2 parallel processor: architecture and hardware implementation K2并行处理器:体系结构和硬件实现
M. Annaratone, Marco Fillo, K. Nakabayashi, M. Viredaz
{"title":"The K2 parallel processor: architecture and hardware implementation","authors":"M. Annaratone, Marco Fillo, K. Nakabayashi, M. Viredaz","doi":"10.1145/325164.325118","DOIUrl":"https://doi.org/10.1145/325164.325118","url":null,"abstract":"K2 is a distributed-memory parallel processor designed to support a multiuser, multitasking, time-sharing operating system and an automatically parallelizing Fortran compiler. The architecture and the hardware implementation of K2 are presented. The authors focus on the architectural features required by the operating system and the compiler. A prototype machine with 24 processors is currently being developed.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131086499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Architectural support for the management of tightly-coupled fine-grain goals in Flat Concurrent Prolog 在扁平并发Prolog中对紧耦合细粒度目标管理的体系结构支持
L. Alkalaj, T. Lang, M. Ercegovac
{"title":"Architectural support for the management of tightly-coupled fine-grain goals in Flat Concurrent Prolog","authors":"L. Alkalaj, T. Lang, M. Ercegovac","doi":"10.1145/325164.325155","DOIUrl":"https://doi.org/10.1145/325164.325155","url":null,"abstract":"Architectural support is proposed for goal management as part of a special-purpose processor architecture for the efficient execution of Flat Concurrent Prolog. Goal management operations, namely, halt, spawn, suspend, and commit, are decoupled from goal reduction and overlapped in the goal management unit. Their efficient execution is enabled using a goal cache. The authors evaluate the performance of the goal management support using an analytic performance model and program parameters characteristic of the system's development workload. Most goal management operations are completely overlapped, resulting in a speedup of 2. Higher speedups are obtained for workloads that exhibit greater goal management complexity.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114107237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
VAX vector architecture VAX矢量结构
D. Bhandarkar, Richard Brunner
{"title":"VAX vector architecture","authors":"D. Bhandarkar, Richard Brunner","doi":"10.1145/325164.325145","DOIUrl":"https://doi.org/10.1145/325164.325145","url":null,"abstract":"The VAX architecture has been extended to include an integrated, register-based vector processor. This extension allows both high-end and low-end implementations and can be supported with only small changes by VAX/VMS and VAX/ULTRIX operating systems. The extension is effectively exploited by the new vectorizing capabilities of VAX Fortran. Features of the VAX vector architecture and the design decisions which make it a consistent extension of the VAX architecture are discussed.<<ETX>>","PeriodicalId":297046,"journal":{"name":"[1990] Proceedings. The 17th Annual International Symposium on Computer Architecture","volume":"38 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114127130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信