F. Luna, Pablo H. Zapata-Cano, J. Valenzuela-Valdés, P. Padilla
{"title":"A robust approach to the cell switch-off problem in 5G ultradense networks","authors":"F. Luna, Pablo H. Zapata-Cano, J. Valenzuela-Valdés, P. Padilla","doi":"10.1109/HPCS48598.2019.9188085","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188085","url":null,"abstract":"Ultra-dense networks (UDNs) are recognized as one of the key enabling technologies of the fifth generation (5G) networks, as they allow for an efficient spatial reuse of the spectrum, which is required to meet the traffic demands foreseen for the next coming years. However, the power consumption of UDNs, with potentially hundreds of small base stations (SBSs) within each macrocell, is a major concern for the cellular operators, and has to be properly addressed prior to the actual deployment of these 5G networks. Among the different existing approaches to address this issue, a widely accepted strategy lies in the selective deactivation of SBSs, but without compromising the QoS provided to the User Equipments (UEs). This is known as the Cell Switch-Off (CSO) problem. The typical formulation of this problem is based on estimations of the traffic demand of the User Equipments (UEs) within the network. But these estimations could not be met. This work approaches these uncertain scenarios by extending the CSO problem with additional objectives that account for the robustness of the solutions to disturbances in these traffic estimates. To do so, a computationally demanding Monte-Carlo sampling is used to evaluate each solution. To manage such an increasingly large computing cost, a parallel version of the NSGA-II algorithm that is able to run on a computing platform composed of more than 500 cores has been used. It is able to compute in roughly 2 hours, an accumulated execution time of more than 42 days, which is within the expected timeframe of operators to make changes in the network configuration.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124734626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applying ADM and OpenFlow to Build High Availability Networks","authors":"James T. Yu","doi":"10.1109/HPCS48598.2019.9188189","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188189","url":null,"abstract":"This paper presents an application of using OpenFlow (OF) to build high availability networks. The approach comes from Add-Drop-Multiplexer (ADM) on the legacy SONET network. The use of ADM avoids the complexity of the MAC learning process and prevents broadcast storm in the loop topology. The proposed solution is demonstrated on Mininet with the POX controller and vSwitch. The observed failover time is within the GR-253 requirement of 50ms. We also compared the proposed approach to the fast failover group in the OF specification and showed its limitation in practical use.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"07 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128944827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khouloud Bouaziz, S. Chtourou, Z. Marrakchi, M. Abid, A. Obeid
{"title":"Exploration of Clustering Algorithms effects on Mesh of Clusters based FPGA Architecture Performance","authors":"Khouloud Bouaziz, S. Chtourou, Z. Marrakchi, M. Abid, A. Obeid","doi":"10.1109/HPCS48598.2019.9188138","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188138","url":null,"abstract":"Field Programmable Gate Arrays (FPGAs) have become a popular medium for the implementation of many digital circuits. Mapping applications into FPGAs requires a set of efficient Computer-Aided Design (CAD) tools to obtain high-quality Integrated Circuits (ICs). One critical issue of FPGA implementation is the quality and efficiency of associated CAD algorithms. In this paper, we are interested in investigating clustering algorithms aspect to optimize Mesh of Clusters (MoCs) FPGA performance. In fact, the way we distribute Logic Blocks (LBs) between FPGA clusters has an important impact on performance. In this paper, we explore the effects of two clustering algorithms (First Choice (FC) and T-VPack) on MoCs FPGA architecture based only on short routing wires. This paper highlights and experimentally demonstrates that FC clustering algorithm ameliorates power consumption, area, critical path delay and energy by an average of 17%, 11%, 13% and 24% respectively compared to T-VPack for MoCs FPGA.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132329456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Z. Maamar, S. Cheikhrouhou, M. Asim, A. Qamar, T. Baker, E. Ugljanin
{"title":"Towards a Resource-aware Thing Composition Approach","authors":"Z. Maamar, S. Cheikhrouhou, M. Asim, A. Qamar, T. Baker, E. Ugljanin","doi":"10.1109/HPCS48598.2019.9188186","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188186","url":null,"abstract":"This paper addresses the silo concern that undermines the participation of IoT-compliant things in composition scenarios. By analogy with composite Web services, each scenario is specified in terms of choreography and orchestration and at design-time and run-time. To define things’ execution behaviors during composition, a set of transactional properties known as pivot, retriable, and compensatable, are used allowing to decide when thing execution should be confirmed, rolledback, or stopped. Along with these properties, another set of availability properties known as limited, renewable, and nonshareable specify the resources that things consume at run-time. Not all resources are always available and hence, could impact the execution of thing composition scenarios. A case study related to Industry 4.0 is used to motivate thing composition.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125432964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A FPGA-Pipelined, High-Throughput Approach to Coarse-Grained Simulation of HPC Systems","authors":"C. Pascoe, R. Blanchard, H. Lam, G. Stitt","doi":"10.1109/HPCS48598.2019.9188129","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188129","url":null,"abstract":"Although previous studies have accelerated discreteevent simulation with various parallelization strategies, total simulation time remains prohibitive for certain use cases that require many independent simulations (e.g., design-space exploration, Monte Carlo simulation). In this paper, rather than focus solely on improved execution time for an individual simulation, we introduce an FPGA-accelerated approach that potentially sacrifices simulation latency to greatly increase throughput by many orders of magnitude. In this approach, the simulation design space is converted to an intermediate dataflow graph representation and ultimately mapped to a simulation pipeline by a custom-built compiler. We describe the design and implementation of our approach. Additionally, we present a resourcesharing strategy that greatly increases design scalability at the cost of slightly reduced simulation throughput. Although not applicable in all scenarios, we demonstrate that this approach can accelerate total simulation time for design-space exploration of HPC algorithmic/architectural co-design by up to 6 orders of magnitude when compared to the same exploration performed with a parallel software simulator.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125448918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Low-Wastage Memory Allocations for Scientific Workflows at IceCube","authors":"Carl Witt, J. Santen, U. Leser","doi":"10.1109/HPCS48598.2019.9188126","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188126","url":null,"abstract":"In scientific computing, scheduling tasks with heterogeneous resource requirements still requires users to estimate the resource usage of tasks. These estimates tend to be inaccurate in spite of laborious manual processes used to derive them. We show that machine learning outperforms user estimates, and models trained at runtime improve the resource allocation for workflows. We focus on allocating main memory in batch systems, which enforce resource limits by terminating jobs.The key idea is to train prediction models that minimize the costs resulting from prediction errors rather than minimizing prediction errors. In addition, we detect and exploit opportunities to predict resource usage of individual tasks based on their input size.We evaluated our approach on a 10 month production log from the IceCube South Pole Neutrino Observatory experiment. We compare our method to the performance of the current production system and a state-of-the-art method. We show that memory allocation quality can be increased from about 50% to 70%, while at the same time allowing users to provide only rough estimates of resource usage.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121685768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating Performance and Potential of the Parallel STL Using NAS Parallel Benchmark Kernels","authors":"Nicco Mietzsch, Karl Fuerlinger","doi":"10.1109/HPCS48598.2019.9188147","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188147","url":null,"abstract":"In recent years, multicore shared memory architectures have become more and more powerful. To effectively use such machines, many frameworks are available, including OpenMP and Intel threading building blocks (TBB). Since the 2017 version of its standard, C++ provides parallel algorithmic building blocks in the form of the Parallel Standard Template Library (pSTL). Unfortunately, compiler and runtime support for these new features improves slowly and few studies on the performance and potential of the pSTL are available.Our goal in this work is to evaluate the applicability of the Parallel STL in the context of scientific and technical parallel computing. To this end, we assess the performance of the pSTL using the NAS Parallel Benchmarks (NPB). Our study shows that, while there are algorithms which are difficult to implement using the pSTL, most kernels can easily be transformed into a pSTL version, with their performance approximately on par with other parallelization approaches.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125175774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan-José Crespo, G. Mathey, J. L. Sánchez, F. J. Alfaro, J. Escudero-Sahuquillo, P. García, F. Quiles
{"title":"Methodology for Decoupled Simulation of SystemVerilog HDL Designs","authors":"Juan-José Crespo, G. Mathey, J. L. Sánchez, F. J. Alfaro, J. Escudero-Sahuquillo, P. García, F. Quiles","doi":"10.1109/HPCS48598.2019.9188056","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188056","url":null,"abstract":"Agile hardware modeling using Hardware Description Languages (HDLs) such as SystemVerilog is greatly limited by the ability of those languages to model complex system abstractions. Often hardware designs rely on complex components not necessarily related with the task performed by the end product, for example components accomplishing debugging or instrumentation tasks. Leveraging hardware instrumentation through high-level programming languages helps designers to focus their attention on the hardware design. This allows to integrate models at different levels of abstraction more easily, enabling existing models written using high-level programming to be used in conjunction with low-level hardware components. In this article, we propose a methodology to enable interaction between components within hardware design projects and also external components written in high-level programming languages.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122867666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Automatically Optimizing PySke Programs","authors":"Jolan Philippe, F. Loulergue","doi":"10.1109/HPCS48598.2019.9188160","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188160","url":null,"abstract":"Explicit parallel programming for shared and distributed memory architectures is an efficient way to deal with data intensive computations. However approaches such as explicit threads or MPI remain difficult solutions for most programmers. Indeed they have to face different constraints such as explicit inter-processors communications or data distribution.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132654624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating GPU Performance for Deep Learning Workloads in Virtualized Environment","authors":"R. Radhakrishnan, Y. Varma, Uday Kurkure","doi":"10.1109/HPCS48598.2019.9188098","DOIUrl":"https://doi.org/10.1109/HPCS48598.2019.9188098","url":null,"abstract":"Deep Learning (DL) is the fastest growing high performance data center class workload today. Deep learning algorithms render themselves well to taking advantage of GPU parallelism, therefore GPGPU acceleration is a mainstay of the DL computing infrastructure. In this paper we evaluate virtualized GPU performance based on training of state-of-the art deep learning models. We find that there is a correlation between the amount of I/O traffic generated in the deep learning training workload and the efficiency of GPGPU performance in virtualized environments. We show that one can achieve high efficiency when using GPGPUs in virtualized and networkattached multi-GPU environments to perform highly computeintensive workloads.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133506863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}