Xiongxiao Xu, Xin Wang, Elkin Cruz-Camacho, Christopher D. Carothers, Kevin A. Brown, Robert B. Ross, Z. Lan, Kai Shu
{"title":"Machine Learning for Interconnect Network Traffic Forecasting: Investigation and Exploitation","authors":"Xiongxiao Xu, Xin Wang, Elkin Cruz-Camacho, Christopher D. Carothers, Kevin A. Brown, Robert B. Ross, Z. Lan, Kai Shu","doi":"10.1145/3573900.3591123","DOIUrl":"https://doi.org/10.1145/3573900.3591123","url":null,"abstract":"Interconnect networks play a key role in high-performance computing (HPC) systems. Parallel discrete event simulation (PDES) has been a long-standing pillar for studying large-scale networking systems by replicating the real-world behaviors of HPC facilities. However, the simulation requirements and computational complexity of PDES are growing at an intractable rate. An active research topic is to build a surrogate-ready PDES framework where an accurate surrogate model built on machine learning can be used to forecast network traffic for improving PDES. In this paper, we make the first attempt to introduce two representative time series methods, the Autoregressive Integrated Moving Average (ARIMA) and the Adaptive Long Short-Term Memory (ADP-LSTM), to forecast the traffic in interconnect networks, using the Dragonfly system as a representative example. The proposed ADP-LSTM can efficiently adapt to the ever-changing network traffic, facilitating the forecasting capability for intricate network traffic, by incorporating a novel online learning strategy. Our preliminary analysis demonstrates promising results and shows that ADP-LSTM can consistently outperform ARIMA with significantly less time overhead.","PeriodicalId":246048,"journal":{"name":"Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125440115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reproducibility Report for the Paper: “Hybrid PDES Simulation of HPC Networks Using Zombie Packets”","authors":"Wen Jun Tan","doi":"10.1145/3573900.3596135","DOIUrl":"https://doi.org/10.1145/3573900.3596135","url":null,"abstract":"The examined paper presents a surrogate model for HPC networks. The authors have uploaded their artifact to Zenodo, which ensures a long-term retention of the artifact. This paper can thus receive the Artifacts Available badge. The artifact allows for easy re-running of experiments for two figures and textual output for one table. All of the dependencies are documented. The software in the artifact runs correctly with minimal intervention, and is relevant to the paper, earning the Artifacts Evaluated–Functional badge. The experimental results are reproduced in two figures and one table, which gains the Results Reproduced badge. Furthermore, since the artifact is also available on GitHub, the paper is assigned the Artifacts Evaluated–Reusable badge.","PeriodicalId":246048,"journal":{"name":"Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"21 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114120351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrea Piccione, Philipp Andelfinger, Alessandro Pellegrini
{"title":"Hybrid Speculative Synchronisation for Parallel Discrete Event Simulation","authors":"Andrea Piccione, Philipp Andelfinger, Alessandro Pellegrini","doi":"10.1145/3573900.3591124","DOIUrl":"https://doi.org/10.1145/3573900.3591124","url":null,"abstract":"Parallel discrete-event simulation (PDES) is a well-established family of methods to accelerate discrete-event simulations. However, the available algorithms vary substantially in the performance achievable for different models, largely preventing generic solutions applicable by modellers without expert knowledge. For instance, in Time Warp, the processing elements execute events asynchronously and speculatively with high aggressiveness, leading to frequent and costly rollbacks if misspeculations occur often. In contrast, synchronous approaches such as the new Window Racer algorithm exhibit a more cautious form of speculation. In the present paper, we combine these two fundamentally different algorithms within a single runtime environment, allowing for a choice of the best algorithm for different model segments. We describe the architecture and the algorithmic considerations to support the efficient coexistence and interaction of the algorithms without violating the correctness of the simulation. Our experiments using a synthetic benchmark and an epidemics model show that the hybrid algorithm is less sensitive to its configuration and can deliver substantially higher performance in models with varying degrees of coupling among entities compared to each algorithm on its own.","PeriodicalId":246048,"journal":{"name":"Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130448629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HELICSAuto: Automating the Development of Cyber-Physical Co-Simulation Framework for Smart Grids","authors":"Sayeb Mohammad Tadvin, Dong Jin, Hui Lin","doi":"10.1145/3573900.3591118","DOIUrl":"https://doi.org/10.1145/3573900.3591118","url":null,"abstract":"Co-simulation is a powerful technique integrating various simulation tools to create a unified simulation environment. It provides an in-depth understanding of the interplay between cyber and physical infrastructures in industrial control systems like smart grids. HELICS is a framework that facilitates co-simulation development by providing common interfaces to enhance simulators, synchronize their executions, and exchange information. In this paper, we propose HELICSAuto, a code instrumentation procedure that automates the integration of domain-specific simulators with HELICS APIs. HELICSAuto requires developers to label their source codes using a pre-defined syntax, after which an interpreter automatically instruments the code with minimal manual involvement. We demonstrate the effectiveness of HELICSAuto by successfully applying it to simulators based on PandaPower, PowerWorld, OPAL-RT, and PyDNP3 to create a transmission-distribution-communication co-simulation environment for complex smart grids.","PeriodicalId":246048,"journal":{"name":"Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134275897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study of Simulating Heterogeneous Workloads on Large-scale Interconnect Network","authors":"Xin Wang","doi":"10.1145/3573900.3593636","DOIUrl":"https://doi.org/10.1145/3573900.3593636","url":null,"abstract":"With the rapid growth of the machine learning applications, the workloads of future HPC systems are anticipated to be a mix of scientific simulation, big data analytics, and machine learning applications. Simulation is a great research vehicle to understand the performance implications of co-running scientific applications with big data and machine learning workloads on large-scale systems. In this work, we propose a scalable workload manager that provides an automatic framework to facilitate hybrid workload simulation. We investigate various hybrid workloads and navigate various application-system configurations for a deeper understanding of performance implications of a diverse mix of workloads on current and future supercomputers.","PeriodicalId":246048,"journal":{"name":"Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116878160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Workload Interference Prevention with Intelligent Routing and Flexible Job Placement on Dragonfly","authors":"Yao Kang, Xin Wang, Z. Lan","doi":"10.1145/3573900.3591119","DOIUrl":"https://doi.org/10.1145/3573900.3591119","url":null,"abstract":"Dragonfly is an indispensable interconnect topology for exascale HPC systems. To link tens of thousands of compute nodes at a reasonable cost, Dragonfly shares network resources with the entire system such that network bandwidth is not exclusive to any single job. Since HPC systems are usually shared between multiple co-running workloads at the same time, network competition between co-existing workloads is inevitable. This network contention appears as workload interference, where a job’s network communication can be severely delayed by other jobs. Recent studies show that, compared with the deployed adaptive routing algorithms, an intelligent routing solution based on reinforcement learning named Q-adaptive routing can reduce workload interference. In addition to improving routing efficiency, job placement is a simple yet effective method to mitigate workload interference. In this study, we leverage the well-known parallel discrete event simulation toolkit, SST, to investigate workload interference on Dragonfly with three contributions. We first develop an automatic module that serves as the bridge between SST and HPC job scheduler for automatic simulation configuration and automated simulation launching. Next, we propose a flexible job placement strategy that can mitigate workload interference based on workload communication characteristics. Finally, we extensively examine the workload interference under various job placement and routing configurations.","PeriodicalId":246048,"journal":{"name":"Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130964876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective Access to the Committed Global State in Speculative Parallel Discrete Event Simulation on Multi-core Machines","authors":"Romolo Marotta, Federica Montesano, F. Quaglia","doi":"10.1145/3573900.3591117","DOIUrl":"https://doi.org/10.1145/3573900.3591117","url":null,"abstract":"Output production and predicate detection are critical in speculative parallel discrete event simulation, since they need to take place accessing past state values—which have become committed—rather than the current state of the simulation objects, which is possibly affected by causality errors related to speculative event processing. In this article, we present an architecture that enables an effective management of the access to the committed state of any simulation object while still guaranteeing: (i) minimal impact on the forward execution of the simulation in terms of synchronization (and rollback generation) and (ii) highly balanced distribution of the tasks among all the threads running the simulation application. Our architecture is devised for speculative simulation engines running on top of shared-memory parallel machines, where worker threads full share the simulation workload. We exploit kernel-level facilities—targeting the Linux operating system—and user level ones, which work together for enabling a suited wall-clock-time collocation of the threads’ activities for the access to the committed global state of the simulation. We integrated our proposal within the USE (Ultimate Share-Everything) open-source simulation platform, and provide an experimental assessment of it.","PeriodicalId":246048,"journal":{"name":"Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121063130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RCR Report of “Zero Lookahead? Zero Problem. The Window Racer Algorithm”","authors":"Romolo Marotta","doi":"10.1145/3573900.3596134","DOIUrl":"https://doi.org/10.1145/3573900.3596134","url":null,"abstract":"The artifact evaluated in this report is relevant to the paper “Zero Lookahead? Zero Problem. The Window Racer Algorithm”. In fact, it allows to run the experiments, reproduce figures and tables. Dependencies are well documented. The process to regenerate data presented in the article completes correctly, and the results are reproducible. Additionally, the authors have uploaded their artifact on permanent repositories, which ensures a long-term retention. Thus, this paper can receive the Artifacts Available, Artifacts Evaluated—Functional, and Results Reproduced badges.","PeriodicalId":246048,"journal":{"name":"Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125121562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Machine Learning Models with Spatial-Temporal Information for Interconnect Network Traffic Forecasting","authors":"Xiongxiao Xu","doi":"10.1145/3573900.3593635","DOIUrl":"https://doi.org/10.1145/3573900.3593635","url":null,"abstract":"Interconnect networks are an essential component of high-performance computing (HPC) systems. To study large-scale networking systems, parallel discrete event simulation (PDES) has been widely used to simulate real-world HPC behaviors. However, PDES simulation requirements and computational complexity are increasing rapidly, making it challenging to achieve accurate results. Therefore, researchers have been exploring a surrogate-ready PDES framework that utilizes machine learning-based surrogate models to accelerate PDES. In this paper, we present our vision and initial step to leverage machine learning models to utilize spatial-temporal information to forecast interconnect network traffic. The preliminary results show that it is promising to explore machine learning models for interconnect network traffic forecasting.","PeriodicalId":246048,"journal":{"name":"Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123182985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Fidelity of P4-Based Network Emulation with a Lightweight Virtual Time System","authors":"Gong Chen, Zhexuan Hu, Dong Jin","doi":"10.1145/3573900.3591120","DOIUrl":"https://doi.org/10.1145/3573900.3591120","url":null,"abstract":"P4’s data-plane programmability allows for highly customizable and programmable packet processing, enabling rapid innovation in network applications, such as virtualization, security, load balancing, and traffic engineering. Researchers extensively use Mininet, a popular network emulator, integrated with BMv2, for fast and flexible prototyping of these P4-based applications, but due to its lower performance in terms of throughput and latency compared to a production-grade software switch like Open vSwitch, it is crucial to have an accurate and scalable emulation testbed. In this paper, we develop a lightweight virtual time system and integrate it into Mininet with BMv2 to enhance fidelity and scalability. By scaling the time of interactions between containers and the underlying physical machine by a time dilation factor (TDF), we can trade time with system resources, making the emulated P4 network appear to be faster from the viewpoint of the switch/host processes in the container. Our experimental results show that the testbed can accurately emulate much larger networks with high loads, scaled by a factor of TDF with extremely low system overhead.","PeriodicalId":246048,"journal":{"name":"Proceedings of the 2023 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"260 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116921770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}