{"title":"Investigating scaling behaviour of monte carlo codes for dense matrix inversion","authors":"J. Strassburg, V. Alexandrov","doi":"10.1145/2133173.2133187","DOIUrl":"https://doi.org/10.1145/2133173.2133187","url":null,"abstract":"With the latest developments in the area of advanced computer architectures, we are already seeing large-scale machines at petascale level and are faced with the exascale computing challenge. All these require scalability at system, algorithmic and mathematical model level. In particular, efficient scalable algorithms are required to bridge the performance gap. Being able to predict application demeanour, performance and scalability of currently used software on new supercomputers of different architectures, varying sizes, and utilising alternative ways of intercommunication, can be of great benefit for researchers as well as application developers. This paper is concerned with scaling characteristics of Monte Carlo based algorithms for matrix inversion. The algorithmic behaviour on large-scale systems will be predicted with the help of an extreme-scale high-performance computing (HPC) simulator.","PeriodicalId":259517,"journal":{"name":"ACM SIGPLAN Symposium on Scala","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131823397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementing a gaussian process learning algorithm in mixed parallel environment","authors":"V. Chandola, Ranga Raju Vatsavai","doi":"10.1145/2133173.2133176","DOIUrl":"https://doi.org/10.1145/2133173.2133176","url":null,"abstract":"In this paper, we present a scalability analysis of a parallel Gaussian process training algorithm to simultaneously analyze a massive number of time series. We study three different parallel implementations: using threads, MPI, and a hybrid implementation using threads and MPI. We compare the scalability for the multi-threaded implementation on three different hardware platforms: a Mac desktop with two quad-core Intel Xeon processors (16 virtual cores), a Linux cluster node with four quad-core 2.3 GHz AMD Opteron processors, and SGI Altix ICE 8200 cluster node with two quad-core Intel Xeon processors (16 virtual cores). We also study the scalability of the MPI based and the hybrid MPI and thread based implementations on the SGI cluster with 128 nodes (2048 cores). Experimental results show that the hybrid implementation scales better than the multi-threaded and MPI based implementations. The application of the proposed algorithm is demonstrated in analyzing massive remote sensing observation data. The hybrid implementation, using 1536 cores, can analyze a data set with over 4 million time series in nearly 5 seconds while the serial algorithm takes nearly 12 hours to process the same data set.","PeriodicalId":259517,"journal":{"name":"ACM SIGPLAN Symposium on Scala","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133584637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}