A. Malony, Srinivasan Ramesh, K. Huck, Chad Wood, S. Shende
{"title":"Towards Runtime Analytics in a Parallel Performance System","authors":"A. Malony, Srinivasan Ramesh, K. Huck, Chad Wood, S. Shende","doi":"10.1109/HPCS48598.2019.9188097","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188097","url":null,"abstract":"Developers of scientific simulations use parallel performance systems to measure, analyze, and tune their applications on large-scale HPC machines. In the majority of these performance systems, the analysis takes place offline. More consequentially, if runtime analytics are desired, performance measurement infrastructures need to be designed and implemented in such a way to make it possible. We investigate the question of how to create runtime analytics capabilities by considering this objective in a reference platform – the TAU Performance System. Our research work identifies general issues of concern and describes how these can be addressed in a new TAUbased analytics framework. Several case studies are proposed as different analytics examples. These are prototyped, evaluated on HPC machines, and discussed. The outcomes of the research study suggest that runtime analytics has merit. Furthermore, we believe the approach could directly carry forward to other parallel performance systems.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124698305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hajer Saidi, M. Turki, Z. Marrakchi, M. Saleh, M. Abid
{"title":"New CAD Tools to ConFigure Tree-Based Embedded FPGA","authors":"Hajer Saidi, M. Turki, Z. Marrakchi, M. Saleh, M. Abid","doi":"10.1109/HPCS48598.2019.9188201","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188201","url":null,"abstract":"An embedded FPGA (e-FPGA) is an IP which can be integrated in a System on Chip architecture to add more flexibility and reconfigurability to the design. This e-FPGA needs to be designed, optimized and configured differently compared to a classic FPGA chip. In this paper, we propose a full workflow to conFigure tree-based e-FPGA architecture. The workflow includes some existing tools used for mesh architecture. We modified these tools and adapt them accordingly to the proposed e-FPGA constraints. The new workflow reduces the execution runtime by an average of 57 % compared to the academic workflow used for mesh architecture. The e-FPGA area is also reduced by an average of 27% compared to the mesh architecture. This optimization is due to the ability of the CAD tools to manage the different processes of the configuration workflow for the tree-based architecture.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127120795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Analysis of Compressed Batch Matrix Operations on Small Matrices","authors":"B. Gravelle, B. Norris","doi":"10.1109/HPCS48598.2019.9188206","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188206","url":null,"abstract":"Dense matrix computations with very small matrices present unique challenges for performance optimization and occupy and important space in many HPC computations including PDE solvers, machine learning algorithms, and Kalman filters. Using batch computation can improve their performance significantly and compressed batch (also called block-interleaved) data structures can further improve performance. In this paper we present a detailed study of how compressed batch computations use HPC hardware and how they can be most effectively tuned for cache performance.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"19 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129504819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Miniaturized Implantable Coplanar Waveguide Antenna for Biomedical Applications","authors":"A. Damaj, H. M. E. Misilmani, S. A. Chahine","doi":"10.1109/HPCS48598.2019.9188130","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188130","url":null,"abstract":"Implanting electronic devices in the human body is a delicate operation that can be hazardous and of negative consequences. Such implantable devices must be safe and small without sacrificing their performance characteristics, namely their efficiency and accuracy. This paper presents, the design, simulation, fabrication, and validation of small antenna implementation suitable for implanting in the human body. The development comprises small CPW feed antenna that can operate around 1.4 GHz standard Wireless Medical Telemetry Service (WMTS) band, with an ability to tune the frequency by merely varying the length of the monopole antenna. The behavior of the designed antenna thoroughly analyzed under a state-of-the-art simulator. The effectiveness of the fabricated antenna ensured through extensive in vitro tests and comparisons with the response obtained through simulations. The results confirm the successful development of small and implantable antenna with appealing performance characteristics.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"52 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122848356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Klijn, Sandra Díaz-Pier, A. Morrison, A. Peyser
{"title":"Staged deployment of interactive multi-application HPC workflows","authors":"W. Klijn, Sandra Díaz-Pier, A. Morrison, A. Peyser","doi":"10.1109/HPCS48598.2019.9188104","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188104","url":null,"abstract":"Running scientific workflows on a supercomputer can be a daunting task for a scientific domain specialist. Workflow management solutions (WMS) are a standard method for reducing the complexity of application deployment on high performance computing (HPC) infrastructure. We introduce the design for a middleware system that extends and combines the functionality from existing solutions in order to create a high-level, staged usercentric operation/deployment model. This design addresses the requirements of several use cases in the life sciences, with a focus on neuroscience. In this manuscript we focus on two use cases: 1) three coupled neuronal simulators (for three different space/time scales) with in-transit visualization and 2) a closed-loop workflow optimized by machine learning, coupling a robot with a neural network simulation. We provide a detailed overview of the application-integrated monitoring in relationship with the HPC job. We present here a novel usage model for large scale interactive multi-application workflows running on HPC systems which aims at reducing the complexity of deployment and execution, thus enabling new science.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115829708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vanessa Miranda-López, Andrei Tchernykh, M. Babenko, Viktor Andreevich Kuchukov, M. Deryabin, E. Golimblevskaia, Egor Shiryaev, A. Avetisyan, R. Rivera-Rodríguez, G. Radchenko, E. Talbi
{"title":"Weighted Two-Levels Secret Sharing Scheme for Multi-Clouds Data Storage with Increased Reliability","authors":"Vanessa Miranda-López, Andrei Tchernykh, M. Babenko, Viktor Andreevich Kuchukov, M. Deryabin, E. Golimblevskaia, Egor Shiryaev, A. Avetisyan, R. Rivera-Rodríguez, G. Radchenko, E. Talbi","doi":"10.1109/HPCS48598.2019.9188057","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188057","url":null,"abstract":"Cloud storages delivered as services are made available to the general public. As cloud storage provider prices become low, they have moved into the mainstream of storage technology. However, there are various factors that cause many potential users do not use it intensively. There exist high risks for confidentiality, integrity, and availability violation associated with the loss of information, denial of access, technical failures, etc. In this article, we propose a two-level secret sharing scheme (TL-SSS) based on a residue number system (RNS) for a configurable, reliable, and secure distributed data storage. RNS moduli of a special type increase the reliability of the data storage system and reduce the computational complexity of the data encoding and decoding from linear-logarithmic to linear. TL-SSS is the weighted data access structure. It creates and distributes data shares according to the characteristics of the cloud storages under various scenarios. We provide a solution that improves system reliability without reduction of the security level. In contrast to classical solutions, it can restore the data with less available shares than the state-of-the-art approaches.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124369922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Co-execution on Commodity Heterogeneous Systems: Optimizations for Time-Constrained Scenarios","authors":"Raúl Nozal, J. L. Bosque, R. Beivide","doi":"10.1109/HPCS48598.2019.9188188","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188188","url":null,"abstract":"Heterogeneous systems are present from powerful supercomputers, to mobile devices, including desktop computers, thanks to their excellent performance and energy consumption. The ubiquity of these architectures in both desktop systems and medium-sized service servers allow enough variability to exploit a wide range of problems, such as multimedia workloads, video encoding, image filtering and inference in machine learning. Due to the heterogeneity, some efforts have been done to reduce the programming effort and preserve performance portability, but these systems include a set of challenges. The context in which applications offload the workload along with the management overheads introduced when doing co-execution, penalize the performance gains under time-constrained scenarios. Therefore, this paper proposes optimizations for the EngineCL runtime to reduce the penalization when co-executing in commodity systems, as well as algorithmic improvements when load balancing. An exhaustive experimental evaluation is performed, showing optimization improvements of 7.5% and 17.4% for binary and ROI-based offloading modes, respectively. Thanks to all the optimizations, the new load balancing algorithm is always the most efficient scheduling configuration, achieving an average efficiency of 0.84 under a pessimistic scenario.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124617289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Counters based Power Modeling of Mobile GPUs using Deep Learning","authors":"Nadjib Mammeri, Markus Neu, S. Lal, B. Juurlink","doi":"10.1109/HPCS48598.2019.9188139","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188139","url":null,"abstract":"GPUs have recently become important computational units on mobile devices, resulting in heterogeneous devices that can run a variety of parallel processing applications. While developing and optimizing such applications, estimating power consumption is of immense importance as energy efficiency has become the key design constraint to optimize for on these platforms. In this work, we apply deep learning techniques in building a predictive model for estimating power consumption of parallel applications on a heterogeneous mobile SoC. Our model is an artificial neural network (NN) trained using CPU and GPU hardware performance counters along with measured power data. The model is trained and evaluated with data collected using a set of graphics OpenGL workloads as well as OpenCL compute benchmarks. Our evaluations show that our model can achieve accurate power estimates with a mean relative error of 4.47% with respect to real power measurements. When compared to other models, our NN model is about 3.3x better than a statistical linear regression model and 2x better than a state-of-the-art NN model.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114718804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Weighted Centroid Localization Algorithm for Wireless Sensor Networks","authors":"Abdelali Hadir, K. Zine-dine, M. Bakhouya","doi":"10.1109/HPCS48598.2019.9188226","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188226","url":null,"abstract":"Localization phase for sensor nodes is considered as the most important service for many applications of IoT. In the last decides, many localization algorithm have been presented to provide location of sensor nodes, but a limited number of these algorithms have been introduced to accurately estimate the location of sensor nodes in the Internet of Things (IoT). In this paper, we propose localization algorithms for sensor nodes in WSNs. The algorithms based on Centroid localization algorithm for Internet of Things. The Proposed algorithm is based on the new weighted formula to calculate the position of unknown nodes. Our presented algorithms guarantee that the nodes estimate their own position with a butter localization accuracy in comparison to the Centroid algorithm and DV-Hop in all considered scenarios. Moreover, the proposed simulation confirms that the introduced algorithms can significantly enhance the localization error and of algorithms in Wireless Sensor Networks.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124431757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Auto-tuning methodology for configuration and application parameters of hybrid CPU + GPU parallel systems based on expert knowledge","authors":"P. Czarnul, P. Rosciszewski","doi":"10.1109/HPCS48598.2019.9188060","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188060","url":null,"abstract":"Auto-tuning of configuration and application parameters allows to achieve significant performance gains in many contemporary compute-intensive applications. Feasible search spaces of parameters tend to become too big to allow for exhaustive search in the auto-tuning process. Expert knowledge about the utilized computing systems becomes useful to prune the search space and new methodologies are needed in the face of emerging heterogeneous computing architectures. In this paper we propose an auto-tuning methodology for hybrid CPU/GPU applications that takes into account previous execution experiences, along with an automated tool for iterative testing of chosen combinations of configuration, as well as application-related parameters. Experimental results, based on a parallel similarity search application executed on three different CPU + GPU parallel systems, show that the proposed methodology allows to achieve execution times worse by only up to 8% compared to a search algorithm that performs a full search over combinations of application parameters, while taking only up to 26% time of the latter.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122183389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}