R. Thulasiram, P. Thulasiraman, C. Adiele, D. Bondarenko
{"title":"Performance analysis of a multithreaded pricing algorithm on Cilk","authors":"R. Thulasiram, P. Thulasiraman, C. Adiele, D. Bondarenko","doi":"10.1109/HPCSA.2002.1019164","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019164","url":null,"abstract":"In this paper, we develop a multithreaded algorithm for pricing simple options and implement it on a 8 node SMP machine using MIT's supercomputer programming language Cilk. The algorithm dynamically creates lots of threads to exploit parallelism and relies on the Cilk runtime system to distribute the computation load. We present both analytical and experimental results and our results explain how Cilk could be used effectively to exploit parallelism in the given problem. The analytical results show that our algorithm has a very high average parallelism and hence Cilk is the target paradigm to implement the algorithm. We conclude from our implementation results that the size of the threads, the number of threads created, the load balancer the cost of spawning a thread are parameters that must be considered while designing the algorithm on the Cilk platform.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127875780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Architectural extensions to support effcient communication using message prediction","authors":"A. Afsahi, N. Dimopoulos","doi":"10.1109/HPCSA.2002.1019130","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019130","url":null,"abstract":"With increasing uniprocessor and SMP computation power, workstation clusters are becoming viable alternatives to high performance computing systems.Communication overhead affects the performance of parallel computers significantly. A significant portion of the software communication overhead is attributable to message copying. We argue that it is possible to address the message copying problem at the receiving side through speculation. We show that messages display a form of locality, and we introduce the notion of message prediction for the receiving side of message-passing systems. By predicting a receive communication call before it is posted, we are able to place the required message directly into the cache speculatively before it is needed so that effectively a zero-copy communication can be achievedSpecific extensions to the ISA and the processor architecture accommodate late binding without requiring copying of the message.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131218018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A communication model for CORBA systems","authors":"R. Sáenz, B. d'Auriol","doi":"10.1109/HPCSA.2002.1019166","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019166","url":null,"abstract":"A multi-component communication analysis of the CORBA specification is conducted and an analytic model describing communication costs is proposed. The analysis indicates: a) potential expensive communication when integrating parallel and distributed computing into a CORBA middleware framework, and b) a relationship between communication costs and code layout in CORBA's Interface Definition Language (IDL).","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128835820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph-based algorithms for parallel processes","authors":"S. Yordanova","doi":"10.1109/HPCSA.2002.1019142","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019142","url":null,"abstract":"The aim of this paper is to present some graph-based algorithms for Finite State Process (FSP). Communicating Sequential Process generated by Finite Transition System is called FSP. If P and Q are FSP, the parallel process P/spl par/Q also is FSP. We will present an algorithm to construct parallel process of finite set of FSP and an algorithm to construct finite complete prefix.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132209479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Mello, Maria Stela Veludo de Paiva, L. Trevelin, A. Gonzaga
{"title":"Analysis on the significant information to update the tables on occupation of resources by using a peer-to-peer protocol","authors":"R. Mello, Maria Stela Veludo de Paiva, L. Trevelin, A. Gonzaga","doi":"10.1109/HPCSA.2002.1019159","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019159","url":null,"abstract":"The adequate occupation of the computing resources can influence, in a decisive way, the global performance of the system. Therefore, in order to achieve a high performance, it is mandatory to know all the computing resources involved and their respective occupation level in a certain moment. With the objective of improving the system performance, the paper presents the OpenTella model to update the information related to the occupation of resources and the respective analysis of this occupation so that the migration of processes among computers of a same cluster can be completed. With the objective of increasing the scale level in the system and decreasing the number of messages among the computers, this peer-to-peer protocol defines sub-nets, which are clusters that make up a more comprehensive cluster. Thus, groups are defined to interchange information and update the occupation of resources, in order to minimize the communication and to achieve a calculation to balance the load and meet the system needs, resulting in the migration of processes.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133350770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed rasterization using OpenGL","authors":"David Calvert, D. Thompson","doi":"10.1109/HPCSA.2002.1019171","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019171","url":null,"abstract":"This work examines the facility of using a large distributed memory system for rasterization of computer graphics using the OpenGL and GLUT libraries. Issues examined include the performance increases achieved through parallel processing and the effects of different methods for dividing the framebuffer over multiple processors.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132760817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel asynchronous Richardson method for the solution of obstacle problem","authors":"P. Spitéri, M. Chau","doi":"10.1109/HPCSA.2002.1019146","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019146","url":null,"abstract":"The present study deals with parallel asynchronous iterations applied to the numerical solution of the obstacle problem defined in a three-dimensional domain. For the considered problem, the convergence analysis of the algorithm is made. Finally, the implementation of the algorithms are presented and computational experiments on IBM-SP3 are analysed.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125620176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid system, a new low cost parallel cluster","authors":"Leo Chin Sim, Heiko Schröder, G. Leedham","doi":"10.1109/HPCSA.2002.1019131","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019131","url":null,"abstract":"This paper describes a new parallel architectural system which we called Hybrid System. As the name implies, Hybrid System is a combination of both SIMD and MIMD systems working concurrently. This new parallel architecture has the capability of achieving speed-up rates more than MIMD can achieve alone and can also be more flexible than multiple SIMD on a single station. In this paper, we introduce our new SIMD concept and also show the contribution of the SIMD on the Hybrid System. We also developed a general formula for determining the speedup of the Hybrid System so that accurate predictions can be made on its performance. This, we establish by comparing the measured speedup with the computed speedup for a volume rendering application.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130942203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient gather operation in heterogeneous cluster systems","authors":"Fukuhito Ooshita, Susumu Matsumae, T. Masuzawa","doi":"10.1109/HPCSA.2002.1019155","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019155","url":null,"abstract":"A heterogeneous cluster system consisting of different types of workstations and communication links plays an important role in parallel computing. In many applications on the system, collective communication operations are commonly used as communication primitives. Thus, design of the efficient collective communication operations is the key to achieve high-performance parallel computing. But the heterogeneity of the system complicates the design. In this paper, we consider design of an efficient gather operation, one of the most important collective operations. We show that an optimal gather schedule is found in O(n/sup 2k-1/) time for the heterogeneous cluster system with n processors of k distinct types, and that a nearly-optimal schedule is found in O(n) time if k = 2.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133878585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sung-Eun Choi, E. A. Hendriks, R. Minnich, M. Sottile, Aaron Marks
{"title":"Life with Ed: a case study of a linux BIOS/BProc cluster","authors":"Sung-Eun Choi, E. A. Hendriks, R. Minnich, M. Sottile, Aaron Marks","doi":"10.1109/HPCSA.2002.1019132","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019132","url":null,"abstract":"In this paper, we describe experiences with our 127-node/161-processor Alpha cluster estbed, Ed. Ed is unique for two distinct reasons. First, we have replaced the standard BIOS on the cluster nodes with the Linux BIOS which loads Linux directly from non-volatile memory (Flash RAM). Second, the operating system provides a single-system image of the entire cluster, much like a traditional supercomputer. We will discuss the advantages of such a cluster, including time to boot (101 seconds for 100 nodes), upgrade (same as time to boot), and start processes (2.4 seconds for 15,000 processes). Additionally, we have discovered that certain predictions about the nature ofter a scale clusters, such as the need for hierrchical structure, are false. Finally, we argue that to achieve true scalability, terascale clusters must be built in the way of Ed.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115406608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}