Douglas Pereira Pasqualin, M. Diener, A. R. D. Bois, M. Pilla
{"title":"Thread Affinity in Software Transactional Memory","authors":"Douglas Pereira Pasqualin, M. Diener, A. R. D. Bois, M. Pilla","doi":"10.1109/ISPDC51135.2020.00033","DOIUrl":"https://doi.org/10.1109/ISPDC51135.2020.00033","url":null,"abstract":"Software Transactional Memory (STM) is an abstraction to synchronize accesses to shared resources. It simplifies parallel programming by replacing the use of explicit locks and synchronization mechanisms with atomic blocks. A wellknown approach to improve performance of STM applications is to serialize transactions to avoid conflicts using schedulers and mapping algorithms. However, in current architectures with complex memory hierarchies it is also important to consider where the memory of the program is allocated and how it is accessed. An important technique for improving memory locality is to map threads and data of an application based on their memory access behavior. This technique is called sharing-aware mapping. In this paper, we introduce a method to detect sharing behavior directly inside the STM library by tracking and analyzing how threads perform STM operations. This information is then used to perform an optimized mapping of the application's threads to cores in order to improve the efficiency of STM operations. Experimental results with the STAMP benchmarks show performance gains of up to 9.7x (1.4x on average), and a reduction of the number of aborts of up to 8.5x, compared to the Linux scheduler.","PeriodicalId":426824,"journal":{"name":"2020 19th International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"4 18","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120806173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robustness Analysis of Scaled Resource Allocation Models Using the Imperial PEPA Compiler","authors":"W. Sanders, Srishti Srivastava, I. Banicescu","doi":"10.1109/ISPDC51135.2020.00018","DOIUrl":"https://doi.org/10.1109/ISPDC51135.2020.00018","url":null,"abstract":"The increase in scale provided by distributed computing systems has expanded scientific discovery and engineering solutions. Stochastic modeling with Performance Evaluation Process Algebra (PEPA) has been used to evaluate the robustness of static resource allocations in parallel and distributed computing systems. These evaluations have previously been performed through the PEPA Plug-In for the Eclipse Integrated Development Environment and have been limited by factors that include: i) the size and complexity of the underlying, in-use PEPA model, ii) a small number of resource allocation models available for analysis, and iii) the human interaction necessary to configure the PEPA Eclipse Plug-In, thus limiting potential automation. As the size and complexity of the underlying PEPA models increases, the number of states to be evaluated for each model also greatly increases, leading to a case of state space explosion. In this work, we validate the Imperial PEPA Compiler (IPC) as a replacement for the PEPA Eclipse Plug-In for the robustness analysis of resource allocations. We make available an implementation of the IPC as a Singularity container, as part of a larger online repository of PEPA resources. We then develop and test a programmatic method for generating PEPA models for resource allocations. When combined with our IPC container, this method allows automated analysis of resource allocation models at scale. The use of the IPC allows the evaluation of larger models than it is possible when using the PEPA Eclipse Plug-In. Moreover, the increases in scale in both model size and number of models, support the development of improved makespan targets for robustness metrics, including those among applications subject to perturbations at runtime, as found in typical parallel and distributed computing environments.","PeriodicalId":426824,"journal":{"name":"2020 19th International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114894311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating Minimal Nondeterministic Finite Automata Using a Parallel Algorithm","authors":"Tomasz Jastrząb, Z. Czech, Wojciech Wieczorek","doi":"10.1109/ISPDC51135.2020.00015","DOIUrl":"https://doi.org/10.1109/ISPDC51135.2020.00015","url":null,"abstract":"The goal of this paper is to develop a parallel algorithm that, on input of a learning sample, identifies a regular language by means of a nondeterministic finite automaton (NFA). A sample is a pair of finite sets containing positive and negative examples. Given a sample, a minimal NFA or the range of possible sizes of such an NFA, that represents the target regular language is sought. We define the task of finding an NFA, which accepts all positive examples and rejects all negative ones, as a constraint satisfaction problem, and then propose a parallel algorithm to solve the problem. The results of computational experiments on the variety of test samples are reported.","PeriodicalId":426824,"journal":{"name":"2020 19th International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121963542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neutrino: Efficient InfiniBand Access for Java Applications","authors":"Filip Krakowski, Fabian Ruhland, M. Schöttner","doi":"10.1109/ISPDC51135.2020.00012","DOIUrl":"https://doi.org/10.1109/ISPDC51135.2020.00012","url":null,"abstract":"Fast networks like InfiniBand are important for large-scale applications and big data analytics. Current InfiniBand hardware offers bandwidths of up to 200 Gbit/s with latencies of less than two microseconds. While it is mainly used in high performance computing, there are also some applications in the field of big data analytics. In addition, some cloud providers are offering instances equipped with InfiniBand hardware. Many big data applications and frameworks are written using the Java programming language, but the Java Development Kit does not provide native support for InfiniBand. To this end we propose neutrino, a network library providing comfortable and efficient access to InfiniBand hardware in Java as well as epoll based multithreaded connection management. Neutrino supports InfiniBand message passing as well as remote direct memory access, is implemented using the Java Native Interface, and can be used with any Java Virtual Machine. It also provides access to native C structures via a specially developed proxy system, which in turn enables the developer to leverage the InfiniBand hardware’s full functionality. Our experiments show that efficient access to InfiniBand hardware from within a Java Virtual Machine is possible while fully utilizing the available bandwidth.","PeriodicalId":426824,"journal":{"name":"2020 19th International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121616296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Czarnul, Grzegorz Golaszewski, Grzegorz Jereczek, M. Maciejewski
{"title":"Development and benchmarking a parallel Data AcQuisition framework using MPI with hash and hash+tree structures in a cluster environment","authors":"P. Czarnul, Grzegorz Golaszewski, Grzegorz Jereczek, M. Maciejewski","doi":"10.1109/ISPDC51135.2020.00031","DOIUrl":"https://doi.org/10.1109/ISPDC51135.2020.00031","url":null,"abstract":"In the paper we propose a solution that uses either a 3-layered index structure based on hash tables or a hash+tree structure for efficient parallel processing of data in a Data AcQuisition (DAQ) system. The proposed framework allows for parallel data writes from multiple multithreaded client processes to multiple multithreaded server processes that use a thread-safe hash-table-based library. Communication is conducted using an MPI_THREAD_MULTIPLE enabled MPI implementation. We demonstrate that the solution scales well in two cluster configurations using InfiniBand, specifically for increasing numbers of client as well as server threads. We present how performance depends on various configuration parameters of a DAQ systems like data distribution across the readout system, its size, and percentage of data to be fetched. Furthermore, we show how it depends on the size of value associated with a given write/read. We compare the performance of both proposed data structures for different configurations. The results allow the reader to learn real performance numbers and characteristics of such a solution, applicable to large scale parallel data processing in a DAQ system and choose the optimal solution.","PeriodicalId":426824,"journal":{"name":"2020 19th International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125197372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the ISPDC 2020 Chairs - ISPDC 2020","authors":"","doi":"10.1109/ispdc51135.2020.00005","DOIUrl":"https://doi.org/10.1109/ispdc51135.2020.00005","url":null,"abstract":"","PeriodicalId":426824,"journal":{"name":"2020 19th International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128059764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ISPDC 2020 Commentary","authors":"","doi":"10.1109/ispdc51135.2020.00001","DOIUrl":"https://doi.org/10.1109/ispdc51135.2020.00001","url":null,"abstract":"","PeriodicalId":426824,"journal":{"name":"2020 19th International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114872824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sponsors: ISPDC 2020","authors":"","doi":"10.1109/ispdc51135.2020.00010","DOIUrl":"https://doi.org/10.1109/ispdc51135.2020.00010","url":null,"abstract":"","PeriodicalId":426824,"journal":{"name":"2020 19th International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114979119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Judit Giménez, Estanislao Mercadal, Germán Llort, Sandra Méndez
{"title":"Analyzing the Efficiency of Hybrid Codes","authors":"Judit Giménez, Estanislao Mercadal, Germán Llort, Sandra Méndez","doi":"10.1109/ISPDC51135.2020.00014","DOIUrl":"https://doi.org/10.1109/ISPDC51135.2020.00014","url":null,"abstract":"Hybrid parallelization may be the only path for most codes to use HPC systems on a very large scale. Even within a small scale, with an increasing number of cores per node, combining MPI with some shared memory thread-based library allows to reduce the application network requirements. Despite the benefits of a hybrid approach, it is not easy to achieve an efficient hybrid execution. This is not only because of the added complexity of combining two different programming models, but also because in many cases the code was initially designed with just one level of parallelization and later extended to a hybrid mode. This paper presents our model to diagnose the efficiency of hybrid applications, distinguishing the contribution of each parallel programming paradigm. The flexibility of the proposed methodology allows us to use it for different paradigms and scenarios, like comparing the MPI+OpenMP and MPI+CUDA versions of the same code.","PeriodicalId":426824,"journal":{"name":"2020 19th International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127889062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. D. Falco, E. Laskowski, R. Olejnik, U. Scafuri, E. Tarantino, M. Tudruj
{"title":"Dynamic Load Balancing Based on Multi-Objective Extremal optimization","authors":"I. D. Falco, E. Laskowski, R. Olejnik, U. Scafuri, E. Tarantino, M. Tudruj","doi":"10.1109/ISPDC51135.2020.00027","DOIUrl":"https://doi.org/10.1109/ISPDC51135.2020.00027","url":null,"abstract":"Multi-objective algorithms based on nature-inspired approach of Extremal optimization (EO) used in distributed processor load balancing have been studied in the paper. EO defines task migration aiming at processor load balancing in execution of graph-represented distributed programs. In the multi-objective EO approach, three objectives relevant to distributed processor load balancing are simultaneously controlled: the function dealing with the computational load imbalance in execution of application tasks on processors, the function concerned with the communication between tasks placed on distinct computing nodes and the function related to the task migration number. An important aspect of the proposed multiobjective approach is the method for selecting the best solutions from the Pareto set. Pareto front analysis based on compromise solution approach, lexicographic approach and hybrid approach (lexicographic + numerical threshold) has been performed in dependence on the program graph features, the executive system characteristics and the experimental setting. The algorithms are assessed by simulation experiments with macro data flow graphs of programs run in distributed systems. The experiments have shown that the multi-objective EO approach included into the load balancing algorithms visibly improves the quality of program execution.","PeriodicalId":426824,"journal":{"name":"2020 19th International Symposium on Parallel and Distributed Computing (ISPDC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127472609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}