{"title":"In search of a map: using program slicing to discover potential parallelism in recursive functions","authors":"Adam D. Barwell, K. Hammond","doi":"10.1145/3122948.3122951","DOIUrl":"https://doi.org/10.1145/3122948.3122951","url":null,"abstract":"Recursion schemes, such as the well-known map, can be used as loci of potential parallelism, where schemes are replaced with an equivalent parallel implementation. This paper formalises a novel technique, using program slicing, that automatically and statically identifies computations in recursive functions that can be lifted out of the function and then potentially performed in parallel. We define a new program slicing algorithm, build a prototype implementation, and demonstrate its use on 12 Haskell examples, including benchmarks from the NoFib suite and functions from the standard Haskell Prelude. In all cases, we obtain the expected results in terms of finding potential parallelism. Moreover, we have tested our prototype against synthetic benchmarks, and found that our prototype has quadratic time complexity. For the NoFib benchmark examples we demonstrate that relative parallel speedups can be obtained (up to 32.93× the sequential performance on 56 hyperthreaded cores).","PeriodicalId":130146,"journal":{"name":"Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130720691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Shaikhha, A. Fitzgibbon, S. Jones, Dimitrios Vytiniotis
{"title":"Destination-passing style for efficient memory management","authors":"A. Shaikhha, A. Fitzgibbon, S. Jones, Dimitrios Vytiniotis","doi":"10.1145/3122948.3122949","DOIUrl":"https://doi.org/10.1145/3122948.3122949","url":null,"abstract":"We show how to compile high-level functional array-processing programs, drawn from image processing and machine learning, into C code that runs as fast as hand-written C. The key idea is to transform the program to destination-passing style, which in turn enables a highly-efficient stack-like memory allocation discipline.","PeriodicalId":130146,"journal":{"name":"Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123269013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From high-level radio protocol specifications to efficient low-level implementations via partial evaluation","authors":"G. Mainland, S. Shanmugam","doi":"10.1145/3122948.3122950","DOIUrl":"https://doi.org/10.1145/3122948.3122950","url":null,"abstract":"Software-defined radio (SDR) is a challenging domain for language designers. To be useful in the real world, radio protocol implementations must operate at high data rates with low latency, yet to be useful to implementers, a language should allow programmers to express algorithms at a high level of abstraction without having to worry about the very low-level details that are necessary for meeting performance requirements. Ziria demonstrated that a high-level language for writing wireless physical layer (PHY) protocols could be competitive with hand-written C++, but only in a context where performance-critical computations, such as FFT and Viterbi, were still written in C++ and accessed via a foreign function interface. We demonstrate that a new implementation of Ziria, embodied in the kzc compiler, allows even performance-critical blocks such as FFT and Viterbi to be written in a high-level language without sacrificing performance. Because kzc performs whole-program optimization, a radio protocol pipeline using an implementation of Viterbi written in Ziria can outperform an implementation that calls out to . The contributions of this paper fall into two categories. First, we describe two new optimizations in kzc, both of which are critical for wringing performance out of high-level code: an aggressive partial evaluator for Ziria programs, and an automatic lookup-table (LUT) generator. Second, we show how these optimizations allow the efficient implementation of three performance-critical blocks in Ziria: Viterbi decoding, the Fast Fourier Transform (FFT), and the inverse Fast Fourier Transform (IFFT).","PeriodicalId":130146,"journal":{"name":"Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134370811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Strategies for regular segmented reductions on GPU","authors":"Rasmus W. Larsen, Troels Henriksen","doi":"10.1145/3122948.3122952","DOIUrl":"https://doi.org/10.1145/3122948.3122952","url":null,"abstract":"We present and evaluate an implementation technique for regular segmented reductions on GPUs. Existing techniques tend to be either consistent in performance but relatively inefficient in absolute terms, or optimised for specific workloads and thereby exhibiting bad performance for certain input. We propose three different strategies for segmented reduction of regular arrays, each optimised for a particular workload. We demonstrate an implementation in the Futhark compiler that is able to employ all three strategies and automatically select the appropriate one at runtime. While our evaluation is in the context of the Futhark compiler, the implementation technique is applicable to any library or language that has a need for segmented reductions. We evaluate the technique on four microbenchmarks, two of which we also compare to implementations in the CUB library for GPU programming, as well as on two application benchmarks from the Rodinia suite. On the latter, we obtain speedups ranging from 1.3× to 1.7× over a previous implementation based on scans.","PeriodicalId":130146,"journal":{"name":"Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134343281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VisPar: visualising dataflow graphs from the Par Monad","authors":"Maximilian Algehed, Patrik Jansson","doi":"10.1145/3122948.3122953","DOIUrl":"https://doi.org/10.1145/3122948.3122953","url":null,"abstract":"We present a work in progress tool (VisPar) for visualising computations in the Par monad in Haskell. Our contribution is not a revolutionary new idea but rather a modest addition to the set of tools available for making sense of parallel programs. We hope to show that VisPar can be useful as a teaching tool by providing visualisations of a few examples from a course on parallel functional programming.","PeriodicalId":130146,"journal":{"name":"Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124480513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing","authors":"","doi":"10.1145/3122948","DOIUrl":"https://doi.org/10.1145/3122948","url":null,"abstract":"","PeriodicalId":130146,"journal":{"name":"Proceedings of the 6th ACM SIGPLAN International Workshop on Functional High-Performance Computing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125423368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}