{"title":"GPU-Accelerated VoltDB: A Case for Indexed Nested Loop Join","authors":"A. Nguyen, M. Edahiro, S. Kato","doi":"10.1109/HPCS.2018.00046","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00046","url":null,"abstract":"Graphics Processing Units (GPUs) are traditionally designed for gaming purposes. The new GPU hardware and new programming platforms for GPU applications have enabled GPUs to work as co-processors alongside Central Processing Units (CPUs) in order to speed up general purpose applications. In this paper, we focus on the design and implementation of the GPU-Accelerated indexed nested loop join (INLJ) for in-memory relational database management system (RDBMS). Previous studies have proposed novel approaches for using GPU to improve the performance of the relational INLJ, but they are only implemented on simulation systems. Their performance in current industry RDBMS still needs to be clarified. To this end, we implement the GPU-Accelerated INLJ algorithm and perform various experiments on that join in VoltDB, an inmemory commercial RDBMS. We also propose a method for handling skewed input data, which is a critical problem in the GPU INLJ. Our evaluations indicated that though the GPU-Accelerated INLJ is 2-14X faster than the default INLJ of VoltDB, the memory copy between the host and the GPU memory is the major factor that holds back the join's speedup rate.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115562307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giovanni Simonini, Luca Gagliardelli, Song Zhu, S. Bergamaschi
{"title":"Enhancing Loosely Schema-aware Entity Resolution with User Interaction","authors":"Giovanni Simonini, Luca Gagliardelli, Song Zhu, S. Bergamaschi","doi":"10.1109/HPCS.2018.00138","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00138","url":null,"abstract":"Entity Resolution (ER) is a fundamental task of data integration: it identifies different representations (i.e., profiles) of the same real-world entity in databases. To compare all possible profile pairs through an ER algorithm has a quadratic complexity. Blocking is commonly employed to avoid that: profiles are grouped into blocks according to some features, and ER is performed only for entities of the same block. Yet, devising blocking criteria and ER algorithms for data with highly schema heterogeneity is a difficult and error-prone task calling for automatic methods and debugging tools. In our previous work, we presented Blast, an ER system that can scale practitioners' favorite Entity Resolution algorithms. In current version, Blast has been devised to take full advantage of parallel and distributed computation as well (running on top of Apache Spark). It implements the state-of-the-art unsupervised blocking method based on automatically extracted loose schema information. We build on top of blast a GUI (Graphic User Interface), which allows: (i) to visualize, understand, and (optionally) manually modify the loose schema information automatically extracted (i.e., injecting user's knowledge in the system); (ii) to retrieve resolved entities through a free-text search box, and to visualize the process that lead to that result (i.e., the provenance). Experimental results on real-world datasets show that these two functionalities can significantly enhance Entity Resolution results.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123610530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Algorithms for Multidimensional Data Streams Analysis with Tensor Subspace Models","authors":"B. Cyganek","doi":"10.1109/HPCS.2018.00139","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00139","url":null,"abstract":"In this paper the parallel models for processing of the multi-dimensional data streams are discussed. Stream analysis is performed by tensor models and a fitness measure. The method was tested on the problem of video shot detection showing good accuracy. In this paper efficient algorithms for tensor model construction and model update in the parallel processing framework are presented. Also the parallel version for the off-line stream processing is proposed.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130227377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From Global Choreography to Efficient Distributed Implementation","authors":"Rayan Hallal, Mohamad Jaber, Rasha Abdallah","doi":"10.1109/HPCS.2018.00122","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00122","url":null,"abstract":"We define a methodology to automatically synthesize efficient distributed implementation starting from highlevel global choreography. A global choreography describes the communication logic between the interfaces of a set of predefined processes. The operations provided by the choreography (e.g., multiparty, choice, loop, branching) are master-triggered and conflict-free by construction (no conflict parallel interleaving), which permits the generation of fully distributed implementations (i.e., no need for controllers). We apply our methodology by automatically synthesizing micro-services architectures.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"117 24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126415074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bilal Hammoud, H. Ayad, M. Fadlallah, J. Jomaah, F. Ndagijimana, G. Faour
{"title":"Oil Thickness Estimation Using Single- and Dual- Frequency Maximum-Likelihood Approach","authors":"Bilal Hammoud, H. Ayad, M. Fadlallah, J. Jomaah, F. Ndagijimana, G. Faour","doi":"10.1109/HPCS.2018.00025","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00025","url":null,"abstract":"In this paper, we present Maximum Likelihood single- and dual-frequency estimators for oil spill thickness estimation. The estimators use Minimum-Euclidean distance algorithm, in pre-defined 1-D or 2-D constellation sets, on simulated reflectivities to estimate the thickness of the oil on top of sea surface. Results show that the performance of the estimator is dependent on the radar frequency used and on the real thickness level. The steep slope of the reflectivity at some frequencies allows accurate estimation in some thickness ranges but it can cause significant errors in other ranges due to its periodicity. The monotonic behavior of the reflectivity at other frequencies lead to a less accurate estimation but smaller error range. Performance analysis of the dual-frequency estimator shows that it overcomes the drawbacks of each single-frequency estimator by taking the advantages of the monotonicity at the first frequency and the steepness of the slope of the reflectivity at the second frequency.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125243582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Simulation of Electrophoretic Deposition for Industrial Automotive Applications","authors":"Kevin Verma, Luis Ayuso, R. Wille","doi":"10.1109/HPCS.2018.00080","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00080","url":null,"abstract":"Electrophoretic Deposition (EPD) coating is one of the key applications in automotive manufacturing. In the recent years, tools based on Computational Fluid Dynamics (CFD) have been utilized to simulate corresponding coating processes. However, the complex data used in this application frequently brings standard CFD applications to its limits. For that purpose, a CFD-based tool named ALSIM has been proposed, which employs a unique volumetric decomposition method that addresses these problems. However, certain characteristics of this methodology yield drawbacks for the typical process used in this application - resulting in large execution times. In this work, we present a parallel scheme for this application which addresses these short-comings. To this end, two layers of parallelism are introduced. Both are implemented by employing OpenMP, allowing for the execution on shared memory parallel architectures. Experimental evaluations confirm the scalability and efficiency of the proposed methods. The simulation time of a typical use case in the automotive industry could be reduced from almost 6 days to 13 hours when employing 16 processing cores.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128467538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Song Zhu, Luca Gagliardelli, Giovanni Simonini, D. Beneventano
{"title":"How Improve Set Similarity Join Based on Prefix Approach in Distributed Environment","authors":"Song Zhu, Luca Gagliardelli, Giovanni Simonini, D. Beneventano","doi":"10.1109/HPCS.2018.00136","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00136","url":null,"abstract":"Set similarity join is an essential operation to find similar pairs of records in data integration and data analytics applications. To cope with the increasing scale of the data, several techniques have been proposed to perform set similarity join using distributed frameworks (e.g. MapReduce). In particular, it is publicly available a MapReduce implementation of the PPJoin, that was experimentally demonstrated as one of the best set similarity join algorithm. However, these techniques produce huge amounts of duplicates in order to perform a successful parallel processing. Moreover, these approaches do not guarantee the load balancing, which generates skewness problem and less scalability of these techniques. To address these problems, we propose a duplicate-free technique called TTJoin, that performs set similarity join efficiently by utilizing an innovative filter derived from the prefix filter. Moreover, we implemented TTJoin on Apache Spark, that is one of the most innovative distributed framework. Several experiments on real-world datasets demonstrate the effectiveness of proposed solution with respect to either traditional TTJoin MapReduce implementation.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127017402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Imran Mahmood, J. Zubairi, Sahar Idwan, Izzeddin Matar
{"title":"Experiments in Routing Vehicles for Municipal Services","authors":"Imran Mahmood, J. Zubairi, Sahar Idwan, Izzeddin Matar","doi":"10.1109/HPCS.2018.00156","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00156","url":null,"abstract":"In this paper, route planning of waste collection trucks using R&R (Ruin and Recreate) approach is explored. We assume the trucks are guided by the central depot in selecting the optimal route for waste collection. Heuristic algorithms are simulated to find the optimal routes for the waste management fleet. Through the use of smart dumpsters that can communicate the current level of waste using sensors and communication modules, we aim to reduce the number of trucks used, the total time taken and total distance traveled by the fleet in a day. Our work is based on variations of CVRPTW (Capacitated Vehicle Routing Problem with Time Window Constraint). The central management system selects the dumpsters, based on their waste levels, in descending order, and dispatches appropriate number of trucks, with path assignments using Ruin and Recreate (R&R) approach of the VRPTW strategy. The municipal authority saves transit time, fuel cost and service time by using our approach, through a simulation of smart waste collection.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130688546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Insights into Application-level Solutions towards Resilient MPI Applications","authors":"P. González, Nuria Losada, María J. Martín","doi":"10.1109/HPCS.2018.00101","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00101","url":null,"abstract":"Current petascale systems, formed by hundreds of thousands of cores, are highly dynamic, which causes that hardware failure rates are relatively high. Failure data collected from two large high-performance computing sites have been analysed in [1], showing failure rates from 20 to more than 1,000 failures per year, depending mostly on system size. This can be translated in a failure every 8.7 hours. Future exascale systems, formed by several millions of cores, will be hit by error/faults even more frequently due to their scale and complexity [2]. Thus, long-running applications in these systems will need to use fault tolerance techniques to ensure the successful execution completion.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"17 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132532234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessing the Use of Genetic Algorithms to Schedule Independent Tasks Under Power Constraints","authors":"A. Kassab, J. Nicod, L. Philippe, V. Rehn-Sonigo","doi":"10.1109/HPCS.2018.00052","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00052","url":null,"abstract":"Green data and computing centers, centers using renewable energy sources, can be a valid solution to the over growing energy consumption of data or computing centers and their corresponding carbon foot print. Powering these centers with energy solely provided by renewable energy sources is however a challenge because renewable sources (like solar panels and wind turbines) cannot guarantee a continuous feeding due to their intermittent energy production. The high computation demand of HPC applications requires high power levels to be provided from the power supply. On the other hand, one advantage is that unlike online applications, HPC applications can tolerate delaying the execution of some tasks. Since the users however want their results as early as possible, minimum makespan is usually the main objective when scheduling this kind of jobs. The optimization problem of scheduling a set of tasks under power constraints is however proven to be NP- Complete. Designing and assessing heuristics is hence the only way to propose efficient solutions. In this paper, we present genetic algorithms for scheduling sets of independent tasks in parallel, with the objective of minimizing the makespan under power availability constraints. Extensive simulations show that genetic algorithms can compute good schedules for this problem.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131953851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}