Jiayuan Meng, Xingfu Wu, V. Morozov, V. Vishwanath, Kalyan Kumaran, V. Taylor
{"title":"SKOPE: a framework for modeling and exploring workload behavior","authors":"Jiayuan Meng, Xingfu Wu, V. Morozov, V. Vishwanath, Kalyan Kumaran, V. Taylor","doi":"10.1145/2597917.2597928","DOIUrl":"https://doi.org/10.1145/2597917.2597928","url":null,"abstract":"Understanding workload behavior plays an important role in performance studies. The growing complexity of applications and architectures has increased the gap among application developers, performance engineers, and hardware designers. To reduce this gap, we propose SKOPE, a SKeleton framework for Performance Exploration, that produces a descriptive model about the semantic behavior of a workload, which can infer potential transformations and help users understand how workloads may interact with and adapt to emerging hardware. SKOPE models can be shared, annotated, and studied by a community of performance engineers and system designers; they offer readability in the frontend and versatility in the backend. SKOPE can be used for performance analysis, tuning, and projection. We provide two example use cases. First, we project GPU performance from CPU code without GPU programming or accessing the hardware, and are able to automatically explore transformations and the projected best-achievable performance deviates from the measured by 18% on average. Second, we project the multi-node scaling trends of two scientific workloads, and are able to achieve a projection accuracy of 95%.","PeriodicalId":194910,"journal":{"name":"Proceedings of the 11th ACM Conference on Computing Frontiers","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128113626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ayman Tarakji, Niels Ole Salscheider, Stephan Alt, Jan Heiducoff
{"title":"Feature-based device selection in heterogeneous computing systems","authors":"Ayman Tarakji, Niels Ole Salscheider, Stephan Alt, Jan Heiducoff","doi":"10.1145/2597917.2597927","DOIUrl":"https://doi.org/10.1145/2597917.2597927","url":null,"abstract":"With the advent of accelerator-based heterogeneous parallel systems, the need for a solution of the task-device matching problem is increasing. Due to the enormously growing diversity in existing computing architectures, optimal matching promises to deliver high performance at reduced energy costs. By means of OpenCL and particularly the LLVM compiler infrastructure, our approach makes the task-device matching decisions taking into account the characteristics and particularities of the different processing hardware. We evaluate our approach using a set of OpenCL based real-world applications and well established benchmarks, which are run on different hardware platforms and architectures. Our results indicate highly accurate predictions made by our model during the matching procedure.","PeriodicalId":194910,"journal":{"name":"Proceedings of the 11th ACM Conference on Computing Frontiers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130555932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SAMO: store aware memory optimizations","authors":"K. Raghavendra, Tripti S. Warrier, M. Mutyam","doi":"10.1145/2597917.2597940","DOIUrl":"https://doi.org/10.1145/2597917.2597940","url":null,"abstract":"Cache optimizations and DRAM scheduling play an important role in determining the performance of a system given that the demand for memory is ever increasing. In this paper we track stores both at cache and main memory and apply three different optimizations, one, at the cache level, so that stores are serviced faster and hence load store queue block cycles are reduced, two, at the miss handling architecture wherein we remove entries containing only store requests thereby reducing the cache stall cycles and three, at the main memory where stores are serviced with lesser priority so that actual reads get serviced faster. These three different memory optimizations combined together (store aware memory optimization, SAMO framework) on an average increase the performance of the system and can be augmented with any previously proposed optimization techniques at the memory. SAMO speeds-up the workloads on 4- and 8-core systems by a geometric mean of 5.0% and 7.4%, respectively, with a maximum speed-up of 21.9% and 17.8% on 4- and 8-core systems, respectively.","PeriodicalId":194910,"journal":{"name":"Proceedings of the 11th ACM Conference on Computing Frontiers","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130005828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Lam, S. Cunningham, S. Sreevatsan, Daniel Boley
{"title":"High throughput genetic sequence analysis","authors":"H. Lam, S. Cunningham, S. Sreevatsan, Daniel Boley","doi":"10.1145/2597917.2597957","DOIUrl":"https://doi.org/10.1145/2597917.2597957","url":null,"abstract":"We present an application paradigm in which an unsupervised machine learning approach is applied to high dimensional influenza sequence datasets: (1) human A/H3N2, (2) avian H5, and (3) North American swine influenza H3N2 virus. Interesting visual patterns observed in the A/H3N2 influenza virus led us to hypothesize that vaccination could be one of the driving forces in the evolution of the human A/H3N2 influenza virus. We provide simulation study and statistical results to support our finding that the influenza virus evolves differently in a protected environment than it evolves in the wild. In the swine H3N2 case, our result suggests that the diversification of North American swine influenza virus can be attributed to the mutations at two positively selected sites on the hemaggluttinin protein.","PeriodicalId":194910,"journal":{"name":"Proceedings of the 11th ACM Conference on Computing Frontiers","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133659869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Navaridas, M. Luján, L. Plana, S. Temple, S. Furber
{"title":"On generating multicast routes for SpiNNaker","authors":"J. Navaridas, M. Luján, L. Plana, S. Temple, S. Furber","doi":"10.1145/2597917.2597938","DOIUrl":"https://doi.org/10.1145/2597917.2597938","url":null,"abstract":"The human brain is an immense biological neural network characterized by high degrees of connectivity among neurons. Any system designed to simulate biologically-plausible spiking neuronal networks needs to support such connectivity and the associated communication traffic in the form of spike events. This paper demonstrates the adequacy of multicast communications to achieve such a demanding goal and introduces a collection of algorithms to generate multicast routes. These algorithms target the SpiNNaker interconnect; a two dimensional triangular toroidal mesh with support for selective multicast. As generating multicast routes is a NP-complete problem, these algorithms are an essential ingredient for an efficient operation of SpiNNaker. Although multicast networks have been studied in the literature, existing algorithms cannot be applied efficiently to SpiNNaker. A comprehensive evaluation analyzing the largest configuration of the SpiNNaker system (over 1 million ARM cores) shows that each algorithm provides diverse benefits and drawbacks which can be exploited to avoid possible bottlenecks. Results show that the communication infrastructure of SpiNNaker will be able to support the high communication pressure exerted by simulating in real-time biologically plausible spiking neural applications","PeriodicalId":194910,"journal":{"name":"Proceedings of the 11th ACM Conference on Computing Frontiers","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127356410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O. Serres, Abdullah Kayi, Ahmad Anbar, T. El-Ghazawi
{"title":"Hardware support for address mapping in PGAS languages: a UPC case study","authors":"O. Serres, Abdullah Kayi, Ahmad Anbar, T. El-Ghazawi","doi":"10.1145/2597917.2597945","DOIUrl":"https://doi.org/10.1145/2597917.2597945","url":null,"abstract":"The Partitioned Global Address Space (PGAS) programming model strikes a balance between the explicit, locality-aware, message-passing model and locality-agnostic, but easy-to-use, shared memory model (e.g. OpenMP). However, the PGAS memory model comes at a performance cost which limits both scalability and performance. Compiler optimizations are often not sufficient and manual optimizations are needed which considerably limit the productivity advantage. This paper proposes a hardware architectural support for PGAS, which allows the processor to efficiently handle shared addresses through new instructions. A prototype compiler is realized allowing to use the support with unmodified code, preserving the PGAS productivity advantage. Speedups of up to 5.5x are demonstrated on the unmodified NAS Parallel Benchmarks using the Gem5 full system simulator.","PeriodicalId":194910,"journal":{"name":"Proceedings of the 11th ACM Conference on Computing Frontiers","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128522053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 11th ACM Conference on Computing Frontiers","authors":"","doi":"10.1145/2597917","DOIUrl":"https://doi.org/10.1145/2597917","url":null,"abstract":"","PeriodicalId":194910,"journal":{"name":"Proceedings of the 11th ACM Conference on Computing Frontiers","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123552955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}