{"title":"Fast multidimensional binary image processing with OpenCL","authors":"Daniel Oliveira Dantas, H. Leal","doi":"10.1109/HPCS48598.2019.9188210","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188210","url":null,"abstract":"Binary images are often used in image processing pipelines, and are usually stored in the unpacked format, i.e., 8 bits per pixel. The packed format uses 1 bit per pixel, 8 times less memory, and is a good option when dealing with images too big to fit in RAM. This paper presents a parallel implementation of pixelwise and window operators for packed binary images. The implementation, written in OpenCL, can run in GPUs or multiple core CPUs. The proposed Destination Word Accumulation (DWA) implementation of morphological operations is faster than Leptonica in 2D and up to two orders of magnitude faster than Python and MATLAB in 1D to 5D.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134600779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probabilistic Runtime Guarantees for Statically Scheduled Taskgraphs with Stochastic Task Runtimes","authors":"J. Keller, Sebastian Litzinger, Wolfgang Spitzer","doi":"10.1109/HPCS48598.2019.9188194","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188194","url":null,"abstract":"Tasks with stochastic runtimes and dependencies are frequently met in multicore applications, but static schedulers need deterministic task runtimes as input. We first demonstrate by scheduling experiments that both for binomially and geometrically distributed task runtimes, which are often found in taskgraphs, choice of average task runtime as scheduler input is sufficient to obtain schedules with good average makespan, i.e. that inserting runtime buffers depending on the standard deviation of task runtimes is not helpful in the majority of cases. Furthermore, we compute discretized makespan distributions for schedules with binomially and geometrically distributed runtimes as frequently occuring distributions. Thus, applications where probabilistic makespan guarantees with quantiles (vs. worst case execution times) are usable can profit from our analysis by starting with sampling their makespan distribution to approximate mean and standard deviation, and using our tool to compute the makespan distribution. As a side effect, we see that the rule of thumb “makespan is below average plus three (one) standard deviations in 99% of cases for binomially (geometrically) distributed runtimes” still apply, although makespans are not binomially or geometrically distributed but exhibit heavy tails. We also show how to mathematically derive makespan distribution for taskgraphs with stochastic task runtimes for different distributions, if stronger guarantees are needed.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133955892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed Memory Graph Representation for Load Balancing Data: Accelerating Data Structure Generation for Decentralized Scheduling","authors":"Vinicius Freitas, A. Santana, M. Castro, L. Pilla","doi":"10.1109/HPCS48598.2019.9188134","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188134","url":null,"abstract":"In this paper, we propose a Distributed Graph Model (DGM) and data structure to enable communication-aware heuristics in distributed load balancers (LBs). DGM is motivated by the desire to maintain and use information related to the affinity between tasks (their communication) in order to improve data locality while scheduling tasks in a distributed fashion to avoid the centralization overhead. Results show that DGM is able to achieve speedups of up to 50.4x with 40 virtual cores, when compared to a centralized graph representation with the same purpose. Additionally, we propose a proof-of-concept distributed scheduler that uses DGM, named Edge Migration, and its implementation in the Charm++ parallel programming model. These results show that, although the communication analysis is much faster with DGM, it is still the most relevant overhead in distributed LBs. We also observe that Edge Migration has a decision time in the same order of magnitude as other communication-unaware decentralized algorithms. Thus, DGM can be used in communication-aware distributed LBs to improve load balancing decisions with a small impact in the overall LB performance.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133497891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Boito, Ramon Nou, L. Pilla, J. L. Bez, J. Méhaut, Toni Cortes, P. Navaux
{"title":"On server-side file access pattern matching","authors":"F. Boito, Ramon Nou, L. Pilla, J. L. Bez, J. Méhaut, Toni Cortes, P. Navaux","doi":"10.1109/HPCS48598.2019.9188092","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188092","url":null,"abstract":"In this paper, we propose a pattern matching approach for server-side access pattern detection for the HPC I/O stack. More specifically, our proposal concerns file-level accesses, such as the ones made to I/O libraries, I/O nodes, and the parallel file system servers. The goal of this detection is to allow the system to adapt to the current workload. Compared to existing detection techniques, ours differ by working at run-time and on the server side, where detailed application information is not available since HPC I/O systems are stateless, and without relying on previous traces. We build a time series to represent accesses spatiality, and use a pattern matching algorithm, in addition to an heuristic, to compare it to known patterns. We detail our proposal and evaluate it with two case studies – situations where detecting the current access pattern is important to select the best scheduling algorithm or to tune a fixed algorithm parameter. We show our approach has good detection capabilities, with precision of up to 93% and recall of up to 99%, and discuss all design choices.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133315378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Danelutto, D. D. Sensi, G. Mencagli, M. Torquati
{"title":"Autonomic management experiences in structured parallel programming","authors":"M. Danelutto, D. D. Sensi, G. Mencagli, M. Torquati","doi":"10.1109/HPCS48598.2019.9188228","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188228","url":null,"abstract":"Structured parallel programming models based on parallel design patterns are gaining more and more importance. Several state-of-the-art industrial frameworks build on the parallel design pattern concept, including Intel TBB and Microsoft PPL. In these frameworks, the explicit exposition of parallel structure of the application favours the identification of the inefficiencies, the exploitation of techniques increasing the efficiency of the implementation and ensures that most of the more critical aspects related to an efficient exploitation of the available parallelism are moved from application programmers to framework designers. The very same exposition of the graph representing the parallel activities enables framework designers to emplace efficient autonomic management of non functional concerns, such as performance tuning or power management. In this paper, we discuss how autonomic management features evolved in different structured parallel programming frameworks based on the algorithmic skeletons and parallel design patterns. We show that different levels of autonomic management are possible, ranging from simple provisioning of mechanisms suitable to support programmers in the implementation of ad hoc autonomic managers to the complete autonomic managers whose behaviour may be programmed using high level rules by the application programmers.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124092079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Gobert, Jan Gmys, J. Toubeau, F. Vallée, N. Melab, D. Tuyttens
{"title":"Surrogate-Assisted Optimization for Multi-stage Optimal Scheduling of Virtual Power Plants","authors":"M. Gobert, Jan Gmys, J. Toubeau, F. Vallée, N. Melab, D. Tuyttens","doi":"10.1109/HPCS48598.2019.9188065","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188065","url":null,"abstract":"This paper presents a comparison between two surrogate-assisted optimization methods dealing with two-stage stochastic programming. The Efficient Global Optimization (EGO) framework is challenging a method coupling Genetic Algorithm (GA) and offline-learnt kriging model for the lower stage optimization. The objective is to prove the good behavior of bayesian optimization (and in particular EGO) applied to a real-world two-stage problem with strong dependencies between the stages. The problem consists in determining the optimal strategy of an electricity market player participating in reserve (first stage) as well as day-ahead energy and real-time markets (second stage). The decisions optimized at the first stage induce constraints on the second stage so that both stages can not be dissociated. One additional difficulty is the stochastic aspect due to uncertainties of several parameters (e.g. renewable energy-based generation) that requires more computational power to be handled. Surrogate models are introduced to deal with that additional computational burden. Experiments show that the EGO-based approach gives better results than GA with offline kriging model using smaller budget.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124332202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Issam Raïs, Daniel Balouek-Thomert, Anne-Cécile Orgerie, L. Lefèvre, M. Parashar
{"title":"Leveraging energy-efficient non-lossy compression for data-intensive applications","authors":"Issam Raïs, Daniel Balouek-Thomert, Anne-Cécile Orgerie, L. Lefèvre, M. Parashar","doi":"10.1109/HPCS48598.2019.9188058","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188058","url":null,"abstract":"The continuous increase of data volumes poses several challenges to established infrastructures in terms of resource management and expenses. One of the most important challenges is the energy-efficient enactment of data operations in the context of data-intensive applications. Computing, generating and exchanging growing volumes of data are costly operations, both in terms of time and energy. In the late literature, different types of compression mechanisms emerge as a new way to reduce time spent on data-related operations, but the overall energy cost has not been studied. Based on current advances and benefits of compression techniques, we propose a model that leverages non-lossy compression and identifies situations where compression presents an interest from an energy reduction perspective. The proposed model considers sender, receiver, communications costs over various types of files and available bandwidth. This strategy allows us to improve both time and energy required for communications by taking advantage of idle times and power states. Evaluation is performed over HPC, Big Data and datacenter scenarios. Results show significant energy savings for all types of file while avoiding counter performances, resulting in a strong incentive to actively leverage non-lossy compression using our model.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114820384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Karray, Melek Maalaoui, A. Obeid, A. Ortiz, M. Abid
{"title":"Hardware Acceleration of Kalman Filter for Leak Detection in Water Pipeline Systems using Wireless Sensor Network","authors":"F. Karray, Melek Maalaoui, A. Obeid, A. Ortiz, M. Abid","doi":"10.1109/HPCS48598.2019.9188156","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188156","url":null,"abstract":"The world migration towards automatic and wire-less systems results an increased usage of Wireless Sensor Networks (WSNs). The noticeable popularity of WSNs has imposed enlarged computational in-node demands. Hence, the recourse to fully-integrated and sophisticated systems with low power is a challenging task. Since wireless sensor nodes have limited power resources, it is important to find a balance between energy consumption and computational performance. The traditional software optimizations are not usually suited or enough to find this tradeoff. Consequently, the use of codesign methodology and the careful implementation of hardware accelerator with low frequency processors could offer a good compromise between energy consumption and performance. In this paper, we present a SoC WSN node prototype based on Leon 3 processor for leak detection in water pipeline using Kalman Filter (KF). A hardware acceleration of the KF has been designed and implemented to reduce energy consumption. We have compared also the software implementation of the algorithm and its hardware acceleration in terms of the execution time, the energy consumption and the area requirements. The results show about 97% reduction in energy consumption and execution time without noticeable increased area.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114715752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Matching of Regular Expressions with BSP Automata","authors":"Thibaut Tachon","doi":"10.1109/HPCS48598.2019.9188181","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188181","url":null,"abstract":"Regular expression matching is a core component of many applications including patterns search in text, deep inspection of packet or lexical analysis. Sequential regular expression matching lacks efficiency for large amount of data whereas parallel regular expression matching overhead requires a large number of processors to become negligible. This paper presents a transformation from regular expression (RE) into a parallel form named BSP regular expression (BSPRE). This transformation added to the transformation from BSPRE to parallel automata (BSPA) enable the parallel matching of regular expression. We compare this approach to enumeration method and observe substantial improvement for small number of processors. The automatic transformation from RE to BSPA through BSPRE is the first example of an infinite family of BSP programs that can be generated automatically and that are not simple specializations of a finite library.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116965199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PySke: Algorithmic Skeletons for Python","authors":"Jolan Philippe, F. Loulergue","doi":"10.1109/HPCS48598.2019.9188151","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188151","url":null,"abstract":"PySke is a library of parallel algorithmic skeletons in Python designed for list and tree data structures. Such algorithmic skeletons are high-order functions implemented in parallel. An application developed with PySke is a composition of skeletons. To ease the write of parallel programs, PySke does not follow the Single Program Multiple Data (SPMD) paradigm but offers a global view of parallel programs to users. This approach aims at writing scalable programs easily. In addition to the library, we present experiments performed on a highperformance computing cluster (distributed memory) on a set of example applications developed with PySke.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117324137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}