{"title":"Multi-threading Semantics for Highly Heterogeneous Systems Using Mobile Threads","authors":"P. Kogge","doi":"10.1109/HPCS48598.2019.9188165","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188165","url":null,"abstract":"Heterogeneous architectures are becoming the norm. The results are nodes that are not only multi-threaded, but simultaneously multi-threaded across several different instruction sets and core designs. Unfortunately, programming models for such systems are still evolving, and are nowhere near adequate as we move into an era of extreme heterogeneity with many new accelerator designs. This paper discusses the current range of multi-threading models and what features are liable to be needed for such future architectures. In addition, we suggest the potential value of using a new threading model, termed migrating threads, that may be an excellent match for a common “glue” to efficiently combine all the emerging heterogeneity.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117126186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Energy Consumption and Accuracy of Multithreaded Embedded Runge-Kutta Methods","authors":"T. Rauber, G. Rünger","doi":"10.1109/HPCS48598.2019.9188214","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188214","url":null,"abstract":"The family of Runge-Kutta (RK) methods provides iterative methods for the numerical approximation of solutions of ordinary differential equations (ODEs). Embedded RK methods combine the approximation computation with a step-size control exploiting an embedded solution and a predefined tolerance value. An important aspect of the computation is the accuracy that refers to how closely the approximation solution agrees with the true solution of the ODE system. The computation of solutions with a high accuracy might have a high computational demand and a high energy consumption. This article investigates the execution time and the energy consumption for a varying number of cores and varying operational frequencies. Additionally the influence of the predefined tolerance value and the resulting numerical accuracy is considered for two different types of ODE systems with different execution behavior.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123846839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Demchenko, R. Cushing, W. Los, P. Grosso, C. D. Laat, L. Gommans
{"title":"Open Data Market Architecture and Functional Components","authors":"Y. Demchenko, R. Cushing, W. Los, P. Grosso, C. D. Laat, L. Gommans","doi":"10.1109/HPCS48598.2019.9188195","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188195","url":null,"abstract":"This paper discusses the principles of organisation and infrastructure components of Open Data Markets (ODM) that would facilitate secure and trusted data exchange between data market participants, and other cooperating organisations. The paper provides a definition of the data properties as economic goods and identifies the generic characteristics of ODM as a Service. This is followed by a detailed description of the generic data market infrastructure that can be provisioned on demand for a group of cooperating parties. The proposed data market infrastructure and its operation are employing blockchain technologies for securing data provenance and providing a basis for data monetisation. Suggestions for trust management and data quality assurance are discussed.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125912994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design Space Exploration of Embedded Applications on Heterogeneous CPU-GPU Platforms","authors":"A. Siddiqui, G. Khan","doi":"10.1109/HPCS48598.2019.9188052","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188052","url":null,"abstract":"CPU-GPU platforms possess the potential of enhancing the performance of applications through some unique and diverse capabilities of both CPU-GPU devices. As a result, the methodologies for CPU/GPU system design space exploration for various applications are now considerably more challenging on these heterogeneous platforms. In this paper, we present a heuristic algorithm for partitioning the computation of applications between a CPU and GPU, while satisfying the user-defined constraints. Our methodology leverages the SIMD-related computing and hierarchical memory model of GPUs to optimize application mapping and allocation to CPU-GPU systems. The algorithm partitions the application, which is specified as a Directed Acyclic Graph (DAG), for a CPU-GPU platform to meet the objectives specified by the user. The effectiveness of our methodology is demonstrated by efficiently partitioning and executing MJPEG decoder and benchmark applications on a CPU-GPU system.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124961217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehmet Soysal, M. Berghoff, T. Zirwes, Marc-André Vef, Sebastian Oeste, A. Brinkmann, W. Nagel, A. Streit
{"title":"Using On-Demand File Systems in HPC Environments","authors":"Mehmet Soysal, M. Berghoff, T. Zirwes, Marc-André Vef, Sebastian Oeste, A. Brinkmann, W. Nagel, A. Streit","doi":"10.1109/HPCS48598.2019.9188216","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188216","url":null,"abstract":"In modern HPC systems, parallel (distributed) file systems are used to allow fast access from and to the storage infrastructure. However, I/O performance in large-scale HPC systems has failed to keep up with the increase in computational power. As a result, the I/O subsystem which also has to cope with a large number of demanding metadata operations is often the bottleneck of the entire HPC system. In some cases, even a single bad behaving application can be held responsible for slowing down the entire HPC system, disrupting other applications that use the same I/O subsystem. These kinds of situations are likely to become more frequent in the future with larger and more powerful HPC systems. In this work, we present a simple solution for applications with very high I/O demands. Our proposed solution is to create a private parallel file system on-demand for an HPC job and use the node-local storage devices, e.g. solid-state-disks (SSD). We show that this feature is easy to add to an existing HPC environment and requires only minimal configuration to the system. We conclude that the impact on running applications is manageable and the advantages to applications that generate a high load outweigh the disadvantages. We show that in some cases applications may run slower, but the reduction of load on the global file system is prevailing in these cases.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127240413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Container Application Migration with Load Balanced and Adaptive Parallel TCP","authors":"Wongsatorn Thongthavorn, Prapaporn Rattanatamrong","doi":"10.1109/HPCS48598.2019.9188218","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188218","url":null,"abstract":"Application migration in Wide Area Network (WAN) is needed in many scenarios: disaster recovery and service hand-off in edge cloud. Modern distributed applications are virtualized with multiple virtual machines or containers. This paper focuses on parallel multi-container migration in WAN environments by utilizing multiple TCP connections over a single direct path (a.k.a parallel TCP). Our application migration middleware framework utilizes a feedback controller to determine a proper number of parallel container migration (i.e., parallel window) based on changing network bandwidth. Then a middleware’s scheduler selects migration requests for the parallel window to load balance multiple pairs of hosts. The goal of our migration is to achieve the best possible balance between optimizing the total migration time and average individual migration time. This differs from most existing live migration works that attempt to optimize mainly the down time. Our proposed framework is generalized and not restricted to any particular virtualization technology implementation. For performance evaluation, we conducted a WAN-emulated experiment in static and dynamic network settings. The performance evaluation results show that the total migration time using our feedback controller is less than that of the sequential migration by 32.7% in the static network, and 43.9% in the dynamic network. Moreover, while achieving total migration time comparable to that of the best fixed parallel migration window, our approach can reduce the average individual migration time by 62.7% in the dynamic network setting.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128166630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fall Detection with Privacy as Standard","authors":"Dylan Kelly, D. Delaney, A. Nag","doi":"10.1109/HPCS48598.2019.9188066","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188066","url":null,"abstract":"Current ambient assisted-living (AAL) systems used to detect falls in the elderly often rely on the person in question being either capable of pressing a panic button to alert others of their fall, or a wearable device which detects impacts. These systems can be invasive, negatively impacting on an individual’s sense of independence and privacy which may in turn lead to a lower quality of life. This paper aims to examine a non-invasive means of detecting falls within the home, focusing on three separate approaches to detection, a waist-worn, computer-vision-based and novel in-shoe systems. The effectiveness of each individual system is explored and their effectiveness in detecting falls versus the privacy they provide are examined. The machine learning model which was trained for use with the waist-worn system achieved an accuracy of 74%, with a sensitivity of ~ 70%, using the limited dataset available for this preliminary study. The computer-vision system can accurately detect individuals in a scene as well as fall scenarios, however drastic lighting changes negatively impact the systems performance. Our in-shoe system achieved a zero false positive rate with an accuracy of ~ 67%.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121415639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Context-based Multi-stage Offline Handwritten Mathematical Symbol Recognition using Deep Learning","authors":"Sui Kun Guan, M. Moh, Teng-Sheng Moh","doi":"10.1109/HPCS48598.2019.9188180","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188180","url":null,"abstract":"We propose a context-based multi-stage machine learning (ML) architecture for offline handwritten mathematical symbol recognition. In the absence of context information, the first stage of the architecture acts as a generalized method of training a Multi-Column Deep Neural Network (MCDNN) model for isolated symbol recognition. The second stage trains a deep convolutional neural network that further classifies ambiguous symbols based on each symbols context information. To further improve the classification accuracy, we develop a set of rules in the third stage to classify ambiguity symbols that would avoid violating some mathematical syntax rules. The proposed method is evaluated using the Competition on Recognition of Online Handwritten Mathematical Expressions (CROHME) dataset. Experiments show that the proposed architecture outperforms all other previous approaches, and results the state-of-the-art accuracy on both the CROHME 2013 and 2016 datasets in offline handwritten mathematical symbol recognition. We believe the proposed multi-stage context-based ML architecture would have wide applications on handwritten symbol recognition.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"45 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115814440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Polyhedral Tensor Schedulers","authors":"Benoît Meister, E. Papenhausen, B. Pradelle","doi":"10.1109/HPCS48598.2019.9188233","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188233","url":null,"abstract":"Compiler optimizations based on the polyhedral model are able to automatically parallelize and optimize loop-based code. We acknowledge that while polyhedral techniques can represent a broad set of program transformations, important classes of programs could be parallelized just as well using less general but more tractable techniques. We apply this general idea to the polyhedral scheduling phase, which is one of the typical performance bottlenecks of polyhedral compilation.We focus on a class of programs in which enough parallelism is already exposed in the source program, and which includes Deep Learning layers and combinations thereof, as well as multilinear algebra kernels. We call these programs “tensor codes and consequently call “tensor schedulers” the tractable polyhedral scheduling techniques presented here.The general idea is that we can significantly speed up polyhedral scheduling by restricting the set of transformations considered. As an extra benefit, having a small search space allows us to introduce non-linear cost models, which fills a gap in polyhedral cost models.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117192059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luis Gonzalez-Naharro, J. Escudero-Sahuquillo, P. García, F. Quiles, J. Duato, Wenhao Sun, Xiang Yu, Hewen Zheng
{"title":"Modeling Traffic Workloads in Data-center Network Simulation Tools","authors":"Luis Gonzalez-Naharro, J. Escudero-Sahuquillo, P. García, F. Quiles, J. Duato, Wenhao Sun, Xiang Yu, Hewen Zheng","doi":"10.1109/HPCS48598.2019.9188099","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188099","url":null,"abstract":"POSTER PAPER Data-centers are commonly used by most important cloud providers worldwide in order to provide storage and computing resources, and, based on these resources, advanced IT services and applications. With the expected explosion of data in the next few years, Data-centers will require new architectures to cope with the new requirements of applications and users. One of the crucial subsystems within the Data-center architecture that must evolve accordingly to the new requirements is the interconnection network or Data-center network (DCN). The DCN performance (basically, high communication bandwidth and low latency) must be guaranteed, otherwise the DCN becoming the system bottleneck. There are several key issues that DCN designers must make decisions on, such as the network topology, routing algorithm, congestion management, etc. An important aspect that impacts on the DCN design are the network communication patterns generated by applications and services. In that sense, an accurate modeling of these traffic workloads would help network designers to make better decisions. In this paper, we present an analysis of the few available studies on traffic modeling for DCNs, in order to gather a set of parameters that define the behaviour of common traffic workloads. Based on these parameters, we have implemented a synthetic DCN traffic generator, which has been included in our simulation framework in order to feed the network with the inferred traffic workloads. We have conducted extensive simulations to test the impact of the parameter variation on the network performance. From the obtained results, we can conclude that the destination distribution is crucial for the network performance. Higher oversubscription of destinations generates incast scenarios that lead to congestion situations and head-of-line blocking, affecting other flows that do not contribute to the incast situation and so spoiling the network performance.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125305174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}