{"title":"Architectural support for multilanguage parallel programming on heterogeneous systems","authors":"R. Bisiani, A. Forin","doi":"10.1145/36206.36180","DOIUrl":"https://doi.org/10.1145/36206.36180","url":null,"abstract":"We have designed and implemented a software facility, called Agora, that supports the development of parallel applications written in multiple languages. At the core of Agora there is a mechanism that allows concurrent computations to share data structures independently of the computer architecture they are executed on. Concurrent computations exchange control information by using a pattern-directed technique. This paper describes the Agora shared memory and its software implementation on both tightly and loosely-coupled architectures.","PeriodicalId":117067,"journal":{"name":"Proceedings of the second international conference on Architectual support for programming languages and operating systems","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124881969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Hayes, M. Fraeman, Robert L. Williams, T. Zaremba
{"title":"An architecture for the direct execution of the Forth programming language","authors":"J. Hayes, M. Fraeman, Robert L. Williams, T. Zaremba","doi":"10.1145/36206.36182","DOIUrl":"https://doi.org/10.1145/36206.36182","url":null,"abstract":"We have developed a simple direct execution architecture for a 32 bit Forth microprocessor. The processor can directly access a linear address space of over 4 gigawords. Two instruction types are defined; a subroutine call, and a user defined microcode instruction. On-chip stack caches allow most Forth primitives to execute in a single cycle.","PeriodicalId":117067,"journal":{"name":"Proceedings of the second international conference on Architectual support for programming languages and operating systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114572841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The effect of instruction set complexity on program size and memory performance","authors":"J. Davidson, R. A. Vaughan","doi":"10.1145/36206.36184","DOIUrl":"https://doi.org/10.1145/36206.36184","url":null,"abstract":"One potential disadvantage of a machine with a reduced instruction set is that object programs may be substantially larger than those for a machine with a richer, more complex instruction set. The main reason is that a small instruction set will require more instructions to implement the same function. In addition, the tendency of RISC machines to use fixed length instructions with a few instruction formats also increases object program size. It has been conjectured that the resulting larger programs could adversely affect memory performance and bus traffic. In this paper we report the results of a set of experiments to isolate and determine the effect of instruction set complexity on cache memory performance and bus traffic. Three high-level language compilers were constructed for machines with instruction sets of varying degrees of complexity. Using a set of benchmark programs, we evaluated the effect of instruction set complexity had on program size. Five of the programs were used to perform a set of trace-driven simulations to study each machine's cache and bus performance. While we found that the miss ratio is affected by object program size, it appears that this can be corrected by simplying increasing the size of the cache. Our measurements of bus traffic, however, show that even with large caches, machines with simple instruction sets can expect substantially more main memory reads than machines with dense object programs.","PeriodicalId":117067,"journal":{"name":"Proceedings of the second international conference on Architectual support for programming languages and operating systems","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126634331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A study of scalar compilation techniques for pipelined supercomputers","authors":"S. Weiss, James E. Smith","doi":"10.1145/36206.36191","DOIUrl":"https://doi.org/10.1145/36206.36191","url":null,"abstract":"This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of instruction buffer) on the efficiency of loop unrolling. We also develop a methodology for classifying software pipelining techniques. For loop unrolling, a straightforward scheduling algorithm is shown to produce near-optimal results when not inhibited by recurrences or memory hazards. Software pipelining requires less hardware but also achieves less speedup. Finally, we show that the performance produced with a modified CRAY-1S scalar architecture and a code scheduler utilizing loop unrolling is comparable to the performance achieved by the CRAY-1S with a vector unit and the CFT vectorizing compiler.","PeriodicalId":117067,"journal":{"name":"Proceedings of the second international conference on Architectual support for programming languages and operating systems","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125885310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLSI assist for a multiprocessor","authors":"Bob Beck, B. Kasten, S. Thakkar","doi":"10.1145/36206.36179","DOIUrl":"https://doi.org/10.1145/36206.36179","url":null,"abstract":"Multiprocessors have long been of interest to computer community. They provide the potential for accelerating applications through parallelism and increased throughput for large multi-user system. Three factors have limited the commercial success of multiprocessor systems; entry cost, range of performance, and ease of application. Advances in very large scale integration (VLSI) and in computer aided design (CAD) have removed these limitations, making possible a new class of multiprocessor systems based on VLSI components. A set of requirements for building an efficient shared multiprocessor system are discussed, including: low-level mutual exclusion, interrupt distribution, inter-processor signaling, process dispatching, caching, and system configuration. A system that meets these requirements is described and evaluated.","PeriodicalId":117067,"journal":{"name":"Proceedings of the second international conference on Architectual support for programming languages and operating systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133583497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Rashid, A. Tevanian, Michael Young, D. Golub, R. Baron, David L. Black, W. Bolosky, Jonathan Chew
{"title":"Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures","authors":"R. Rashid, A. Tevanian, Michael Young, D. Golub, R. Baron, David L. Black, W. Bolosky, Jonathan Chew","doi":"10.1145/36206.36181","DOIUrl":"https://doi.org/10.1145/36206.36181","url":null,"abstract":"This paper describes the design and implementation of virtual memory management within the CMU Mach Operating System and the experiences gained by the Mach kernel group in porting that system to a variety of architectures. As of this writing, Mach runs on more than half a dozen uniprocessors and multiprocessors including the VAX family of uniprocessors and multiprocessors, the IBM RT PC, the SUN 3, the Encore MultiMax, the Sequent Balance 21000 and several experimental computers. Although these systems vary considerably in the kind of hardware support for memory management they provide, the machine-dependent portion of Mach virtual memory consists of a single code module and its related header file. This separation of software memory management from hardware support has been accomplished without sacrificing system performance. In addition to improving portability, it makes possible a relatively unbiased examination of the pros and cons of various hardware memory management schemes, especially as they apply to the support of multiprocessors.","PeriodicalId":117067,"journal":{"name":"Proceedings of the second international conference on Architectual support for programming languages and operating systems","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116241357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adam Levinthal, P. Hanrahan, Mike Paquette, J. Lawson
{"title":"Parallel computers for graphics applications","authors":"Adam Levinthal, P. Hanrahan, Mike Paquette, J. Lawson","doi":"10.1145/36206.36202","DOIUrl":"https://doi.org/10.1145/36206.36202","url":null,"abstract":"Specialized computer architectures can provide better price/performance for executing image processing and graphics applications than general purpose designs. Two processors are presented that use parallel SIMD data paths to support common graphics data structures as primitive operands in arithmetic expressions. A variant of the C language has been implemented to allow high level language coding of user applications on these processors. High level programming support is designed into the processor architecture that implements parallel object data typing and parallel conditional evaluation in hardware.","PeriodicalId":117067,"journal":{"name":"Proceedings of the second international conference on Architectual support for programming languages and operating systems","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1987-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124473062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}