{"title":"Seamless GPU Evaluation of Smart Expression Templates","authors":"Baptiste Wicht, Andreas Fischer, J. Hennebert","doi":"10.1109/HPCS.2018.00045","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00045","url":null,"abstract":"Expression Templates is a technique allowing to write linear algebra code in C++ the same way it would be written on paper. It is also used extensively as a performance optimization technique, especially as the Smart Expression Templates form which allows for even higher performance. It has proved to be very efficient for computation on a Central Processing Unit (CPU). However, due to its design, it is not easily implemented on a Graphics Processing Unit (GPU). In this paper, we devise a set of techniques to allow the seamless evaluation of Smart Expression Templates on the GPU. The execution is transparent for the user of the library which still uses the matrices and vector as if it was on the CPU and profits from the performance and higher multi-processing capabilities of the GPU. We also show that the GPU version is significantly faster than the CPU version, without any change to the code of the user.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128692064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Symbolic Matrix Multiplication for Multithreaded Sparse GEMM Utilizing Sparse Matrix Formats","authors":"Marcel Richter, G. Rünger","doi":"10.1109/HPCS.2018.00088","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00088","url":null,"abstract":"Sparse matrices are exploited in many problems from scientific computing and, thus, their efficient implementation is crucial for the overall performance of the problems. Three sparse matrix formats, such as Compressed Sparse Row Storage, Block Sparse Row Storage and Ellpack-Itpack, have been proposed to support an efficient storage and access to sparse matrices. A specific challenge is to implement sparse matrices on parallel platforms and to support efficient access within parallel algorithms. This article is a contribution towards the efficient parallel execution of a multi-threaded general matrix-matrix multiplication (GEMM) using sparse matrices. Major considerations are based on the benefit and overhead of a symbolic GEMM prior to the sparse GEMM operation to obtain information about the result matrix structure. Hence, overhead regarding sorting, merging of data structures and memory allocation routines can be minimized to improve the runtime performance. Multi-threaded GEMM implementations are studied for different storage formats and their performance is investigated for a broad range of sparse test matrices on recent multicore architectures. A constraint of our approach is that the sparse GEMM should be performed such that the sparse matrix format is an invariant property and the result matrix of the GEMM operation is provided in the same format without matrix format changes.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133549028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Dosimont, Harald Servat, M. Wagner, Judit Giménez, Jesús Labarta
{"title":"Identifying the Temporal Structure of Parallel Application Computation Phases","authors":"D. Dosimont, Harald Servat, M. Wagner, Judit Giménez, Jesús Labarta","doi":"10.1109/HPCS.2018.00087","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00087","url":null,"abstract":"Performance analysis tools are essential to help developers improve the performance of their parallel applications. These tools have widely embraced graphical representations to ease the analyst experience. However, they might mislead the analysis if using questionable aggregation techniques, especially when dealing with much data in timelines. In this paper, we have put efforts to demonstrate the value of information theory topics when applied to performance analysis. To this end, we extend a previously designed tool named folding which focuses on a detailed exploration of computation phases using trace files containing instrumented and sampled information. We design appropriate representations for the folding output by adopting an innovative aggregation technique based on information theory. As we will demonstrate through the paper, the original implementation of this tool may hinder the analysis by the introduction of some artifacts as a result of the chosen aggregation techniques. Additionally, we extend the folding tool to provide a decent analysis overview to start the analysis. Last, but not least, we successfully apply the new flow to two in-production HPC applications and characterize their performance behavior.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124913625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Strong Security Guarantees: From Alloy to Coq (Research Poster)","authors":"Salwa Souaf, F. Loulergue","doi":"10.1109/HPCS.2018.00167","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00167","url":null,"abstract":"With the recent discoveries of widespread vulnerabilities, the use of formal methods in the design of system is more and more important, in particular in the context of Cloud computing which can be used for high performance computing applications [1]. Formal methods range from more lightweight method such as the Alloy [2] formal specification language and analyzer to more heavyweight formal methods based on proof assistants such as Isabelle/HOL or Coq [3].","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125018096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Beta Chaotic Watermarking Scheme based on DWT and SVD","authors":"Houda Soudan, R. Ejbali, M. Zaied","doi":"10.1109/HPCS.2018.00107","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00107","url":null,"abstract":"Digital image watermarking is today the most sophisticated technique for protecting copyright and authenticity. Till now, several approaches were proposed to get better results in term of imperceptibility and robustness. In this paper, a non blind digital watermark scheme based on Discrete Wavelet Transform (DWT), Singular Values Decomposition (SVD) and Beta Chaotic Map (BCM) is proposed. Firstly, the digital watermark image is encrypted using BCM. Then, the cover image and the encrypted watermark are decomposed using DWT. The low frequency sub bands, issued from DWT decomposition, of each image are decomposed using SVD to get matrixes of singular values. These matrixes are summed to get watermarked images. The experimental results show that this scheme is robust against several attacks compared to other algorithms.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126831603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jean-François Lalande, Valérie Viet Triem Tong, M. Leslous, Pierre Graux
{"title":"Challenges for Reliable and Large Scale Evaluation of Android Malware Analysis","authors":"Jean-François Lalande, Valérie Viet Triem Tong, M. Leslous, Pierre Graux","doi":"10.1109/HPCS.2018.00173","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00173","url":null,"abstract":"Since Android became the first smartphone operating system, malware developers have put large efforts to craft new threats uploaded to the Google Play store and other third market places. Companies and researchers now include in their activities the analysis of malware targeting smartphones. Most of the time, the problem that is addressed consists in deciding if an application should be considered as a malware or not. Nevertheless, once a malware is tagged as a malicious application, users that have been infected ask for more technical explanations about the threat they have been exposed to. Dissecting a malware requires a lot of efforts for a security analyst to be conducted and companies are in demand of new tools for automatizing the analysis.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122484230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maverick Chardet, Hélène Coullon, Dimitri Pertin, Christian Pérez
{"title":"Madeus: A Formal Deployment Model","authors":"Maverick Chardet, Hélène Coullon, Dimitri Pertin, Christian Pérez","doi":"10.1109/HPCS.2018.00118","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00118","url":null,"abstract":"Distributed software architecture is composed of multiple interacting modules, or components. Deploying such software consists in installing them on a given infrastructure and leading them to a functional state. However, since each module has its own life cycle and might have various dependencies with other modules, deploying such software is a very tedious task, particularly on massively distributed and heterogeneous infrastructures. To address this problem, many solutions have been designed to automate the deployment process. In this paper, we introduce Madeus, a component-based deployment model for complex distributed software. Madeus accurately describes the life cycle of each component by a Petri net structure, and is able to finely express the dependencies between components. The overall dependency graph it produces is then used to reduce deployment time by parallelizing deployment actions. While this increases the precision and performance of the model, it also increases its complexity. For this reason, the operational semantics needs to be clearly defined to prove results such as the termination of a deployment. In this paper, we formally describe the operational semantics of Madeus, and show how it can be used in a use- case: the deployment of a real and large distributed software (i.e., OpenStack).","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128770190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Stamelos, Elias Koromilas, C. Kachris, D. Soudris
{"title":"A Novel Framework for the Seamless Integration of FPGA Accelerators with Big Data Analytics Frameworks in Heterogeneous Data Centers","authors":"I. Stamelos, Elias Koromilas, C. Kachris, D. Soudris","doi":"10.1109/HPCS.2018.00090","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00090","url":null,"abstract":"To face the increased network traffic in the cloud, data center operators have started adopting an heterogeneous approach in their infrastructures. Heterogeneous infrastructures, e.g. based on FPGAs, can provide higher performance and better energy-efficiency compared to the contemporary processors. However, FPGAs lack of an easy-to-use framework for the efficient deployment from high-level programming frameworks. In this paper, we present a novel framework that allows the seamless integration of FPGAs from high-level programming languages, like Java and Scala. The proposed approach provides all the required APIs for the utilization of FPGAs from these languages. The proposed scheme has been mapped on Amazon AWS f1 infrastructure and a performance evaluation is presented for two widely used machine learning algorithms.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126681272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating the Intel Skylake Xeon Processor for HPC Workloads","authors":"S. Hammond, C. Vaughan, C. Hughes","doi":"10.1109/HPCS.2018.00064","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00064","url":null,"abstract":"Despite significant advances in the porting of scientific applications to novel architectures such as compute-optimized graphics processors, many-core processor/accelerators and, even special-purpose function units, the vast majority of scientific calculations are still performed on high-performance, commodity server processors. Even in the cases of applications which have been ported to new architectures, frequent serial sections still require strong server-class processor cores to compute as fast as possible. In this paper we report on a set of benchmark studies which evaluate Intel's latest Skylake Xeon server processor. Skylake represents a significant change in the Xeon product line with wider SIMD vector units, a redesigned cache architecture, and, an increased number of memory channels. The wider vector units provide 2x improvement for some compute-intensive applications and the combined memory changes can provide close to 2x the memory bandwidth. We evaluate these new hardware features on several HPC-relevant mini-applications and benchmarks, including, STREAM, LULESH, XSBench, HPCG and SW4Lite. Together, the new hardware functions provide up to 1.8x speedup on HPC benchmark codes when compared with the previous generation Haswell processor core, providing much greater utility to a broader range of HPC applications that rely on this class of compute node.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129235853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Feature Extraction for Palmprint-Based User Authentication","authors":"Agata Giełczyk, M. Choraś, R. Kozik","doi":"10.1109/HPCS.2018.00104","DOIUrl":"https://doi.org/10.1109/HPCS.2018.00104","url":null,"abstract":"Biometry is often used as a part of the multi-factor authentication in order to improve the security of IT systems. In this paper, we propose the palmprint-based solution for user identity verification. In particular, we present a new approach to feature extraction. The proposed method is based both on texture and color information. Our experiments show that using the proposed hybrid features allows for achieving satisfactory accuracy without increasing requirements for additional computational resources. It is important from our perspective since the proposed method is dedicated to smartphones and other handhelds in mobile verification scenarios.","PeriodicalId":308138,"journal":{"name":"2018 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130673518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}