{"title":"Synchronisation for dynamic load balancing of decentralised conservative distributed simulation","authors":"Quentin Bragard, Anthony Ventresque, L. Murphy","doi":"10.1145/2601381.2601386","DOIUrl":"https://doi.org/10.1145/2601381.2601386","url":null,"abstract":"Synchronisation mechanisms are essential in distributed simulation. Some systems rely on central units to control the simulation but central units are known to be bottlenecks. If we want to avoid using a central unit to optimise the simulation speed, we lose the capacity to act on the simulation at a global scale. Being able to act on the entire simulation is an important feature which allows to dynamically load-balance a distributed simulation. While some local partitioning algorithms exist, their lack of global view reduces their efficiency. Running a global partitioning algorithm without central unit requires a synchronisation of all logical processes (LPs) at the same step. The first algorithm requires the knowledge of some topological properties of the network while the second algorithm works without any requirement. The algorithms are detailed and compared against each other. An evaluation shows the benefits of using a global dynamic load-balancing for distributed simulations.","PeriodicalId":255272,"journal":{"name":"SIGSIM Principles of Advanced Discrete Simulation","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121272341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical resource management for enhancing performance of large-scale simulations on data centers","authors":"Zengxiang Li, Xiaorong Li, Long Wang, Wentong Cai","doi":"10.1145/2601381.2601390","DOIUrl":"https://doi.org/10.1145/2601381.2601390","url":null,"abstract":"More and more interests have been shown to move large-scale simulations on modern data centers composed of a large number of virtualized multi-core computers. However, the simulation components (Federates) consolidated in the same computer may have imbalanced simulation workloads. Similarly, the computers involved in the same simulation execution (Federation) may also have imbalanced simulation workloads. Hence, federates may waste a lot of computer resources on time synchronization with each other. In this paper, a hierarchical resource management system is proposed to enhance simulation execution performance. Federates in the federation are enraptured in their individual Virtual Machines (VMs), which are consolidated on a group of virtualized multi-core computers. On the computer level, multiple VMs share the resource of the computer according to the simulation workloads of their corresponding federates. On the federation level, some VMs are migrated for workload balance purpose. Therefore, computer resources are fully utilized to conduct useful simulation workloads, avoiding the synchronization overheads. Experiments using synthetic and real simulation workloads have verified that the hierarchical resource management system enhances simulation performance significantly.","PeriodicalId":255272,"journal":{"name":"SIGSIM Principles of Advanced Discrete Simulation","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114071466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Zhang, Jingjing Wang, D. Ponomarev, N. Abu-Ghazaleh
{"title":"Exploring many-core architecture design space for parallel discrete event simulation","authors":"Yi Zhang, Jingjing Wang, D. Ponomarev, N. Abu-Ghazaleh","doi":"10.1145/2601381.2601392","DOIUrl":"https://doi.org/10.1145/2601381.2601392","url":null,"abstract":"As multicore and manycore processor architectures are emerging and the core counts per chip continue to increase, it is important to evaluate and understand the performance and scalability of Parallel Discrete Event Simulation (PDES) on these platforms. Most existing architectures are still limited to a modest number of cores, feature simple designs and do not exhibit heterogeneity, making it impossible to perform comprehensive analysis and evaluations of PDES on these platforms. Instead, in this paper we evaluate PDES using a full-system cycle-accurate simulator of a multicore processor and memory subsystem. With this approach, it is possible to flexibly configure the simulator and perform exploration of the impact of architecture design choices on the performance of PDES. In particular, we answer the following four questions with respect to PDES performance and scalability: (1) For the same total chip area, what is the best design point in terms of the number of cores and the size of the on-chip cache? (2) What is the impact of using in-order vs. out-of-order cores? (3) What is the impact of a heterogeneous system with a mix of in-order and out-of-order cores? (4) What is the impact of object partitioning on PDES performance in heterogeneous systems? To answer these questions, we use MARSSx86 simulator for evaluating performance, and rely on Cacti and McPAT tools to derive the area and latency estimates for cores and caches.","PeriodicalId":255272,"journal":{"name":"SIGSIM Principles of Advanced Discrete Simulation","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131551758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Power consumption of data distribution management for on-line simulations","authors":"Sabra A. Neal, Gaurav Kantikar, R. Fujimoto","doi":"10.1145/2601381.2601409","DOIUrl":"https://doi.org/10.1145/2601381.2601409","url":null,"abstract":"With the growing use of mobile devices, power aware algorithms have become essential. Data distribution management (DDM) is an approach to disseminate information that was proposed in the High Level Architecture (HLA) for modeling and simulation. This paper explores the power consumption of mobile devices used by pedestrians in an urban environment communicating through HLA DDM services operating over a mobile ad-hoc network (MANET). The computation and communication power requirements of Grid-Based and Region-Based implementation approaches to DDM are contrasted and quantitatively evaluated through experimentation and simulation.","PeriodicalId":255272,"journal":{"name":"SIGSIM Principles of Advanced Discrete Simulation","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130947419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling and simulation of data center networks","authors":"R. Alshahrani, H. Peyravi","doi":"10.1145/2601381.2601389","DOIUrl":"https://doi.org/10.1145/2601381.2601389","url":null,"abstract":"Data centers are integral part of cloud computing that support Web services, online social networking, data analysis, computation intensive applications and scientific computing. They require high performance components for their inter-process communication, storage and sub-communication systems. The performance bottleneck that used to be the processing power has now been shifted to communication speed within data centers. The performance of a data center, in terms of throughput and delay, is directly related to the performance of the underlying internal communication network. In this paper, we introduce an analytical model that can be used to evaluate the underlying network architecture in data centers. The model can further be used to develop simulation tools that extend the scope of performance evaluation beyond what it can be achieved by the theoretical model in terms of various network topologies, different traffic distributions, scalability, and load balancing. While the model is generic, we focus on its implementation for fat-tree networks that are widely used in data centers. The theoretical results are compared and validated with the simulation results for several network configurations. The results of this analysis provide a basis for data center network design and optimization.","PeriodicalId":255272,"journal":{"name":"SIGSIM Principles of Advanced Discrete Simulation","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116012589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GPU-assisted hybrid network traffic model","authors":"Jason Liu, Yuan Liu, Zhihui Du, Ting Li","doi":"10.1145/2601381.2601382","DOIUrl":"https://doi.org/10.1145/2601381.2601382","url":null,"abstract":"Large-scale network simulation imposes extremely high computing demand. While parallel processing techniques allows network simulation to scale up and benefit from contemporary high-end computing platforms, multi-resolutional modeling techniques, which differentiate network traffic representations in network models, can substantially reduce the computational requirement. In this paper, we present a novel method for offloading computationally intensive bulk traffic calculations to the background onto GPU, while leaving CPU to simulate detailed network transactions in the foreground. We present a hybrid traffic model that combines the foreground packet-oriented discrete-event simulation on CPU with the background fluid-based numerical calculations on GPU. In particular, we present several optimizations to efficiently integrate packet and fluid flows in simulation with overlapping computations on CPU and GPU. These optimizations exploit the lookahead inherent to the fluid equations, and take advantage of batch runs with fix-up computation and on-demand prefetching to reduce the frequency of interactions between CPU and GPU. Experiments show that our GPU-assisted hybrid traffic model can achieve substantial performance improvement over the CPU-only approach, while still maintaining good accuracy.","PeriodicalId":255272,"journal":{"name":"SIGSIM Principles of Advanced Discrete Simulation","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122388766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lock-free pending event set management in time warp","authors":"Sounak Gupta, P. Wilsey","doi":"10.1145/2601381.2601393","DOIUrl":"https://doi.org/10.1145/2601381.2601393","url":null,"abstract":"The rapid growth in the parallelism of multi-core processors has opened up new opportunities and challenges for parallel simulation discrete event simulation (PDES). PDES simulators attempt to find parallelism within the pending event set to achieve speedup. Typically the pending event set is sorted to preserve the causal orders of the contained events. Sorting is a key aspect that amplifies contention for exclusive access to the shared event scheduler and events are generally scheduled to follow the time-based order of the pending events. In this work we leverage a Ladder Queue data structure to partition the pending events into groups (called buckets) arranged by adjacent and short regions of time. We assume that the pending events within any one bucket are causally independent and schedule them for execution without sorting and without consideration of their total time-based order. We use the Time Warp mechanism to recover whenever actual dependencies arise. Due to the lack of need for sorting, we further extend our pending event data structure so that it can be organized for lock-free access. Experimental results show consistent speedup for all studied configurations and simulation models. The speedups range from 1.1 to 1.49 with higher speedups occurring with higher thread counts where contention for the shared event set becomes more problematic with a conventional mutex locking mechanism.","PeriodicalId":255272,"journal":{"name":"SIGSIM Principles of Advanced Discrete Simulation","volume":"26 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123592013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A case study in using massively parallel simulation for extreme-scale torus network codesign","authors":"M. Mubarak, C. Carothers, R. Ross, P. Carns","doi":"10.1145/2601381.2601383","DOIUrl":"https://doi.org/10.1145/2601381.2601383","url":null,"abstract":"A high-bandwidth, low-latency interconnect will be a critical component of future exascale systems. The torus network topology, which uses multidimensional network links to improve path diversity and exploit locality between nodes, is a potential candidate for exascale interconnects.\u0000 The communication behavior of large-scale scientific applications running on future exascale networks is particularly important and analytical/algorithmic models alone cannot deduce it. Therefore, before building systems, it is important to explore the design space and performance of candidate exascale interconnects by using simulation. We improve upon previous work in this area and present a methodology for modeling and simulating a high-fidelity, validated, and scalable torus network topology at a packet-chunk level detail using the Rensselaer Optimistic Simulation System (ROSS). We execute various configurations of a 1.3 million node torus network model in order to examine the effect of torus dimensionality on network performance with relevant HPC traffic patterns. To the best of our knowledge, these are the largest torus network simulations that are carried out at such a detailed fidelity. In terms of simulation performance, a 1.3 million node, 9-D torus network model is shown to process a simulated exascale-class workload of nearest-neighbor traffic with 100 million message injections per second per node using 65,536 Blue Gene/Q cores in a simulation run-time of only 25 seconds. We also demonstrate that massive-scale simulations are a critical tool in exascale system design since small-scale torus simulations are not always indicative of the network behavior at an exascale size. The take-away message from this case study is that massively parallel simulation is a key enabler for effective extreme-scale network codesign.","PeriodicalId":255272,"journal":{"name":"SIGSIM Principles of Advanced Discrete Simulation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131390312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Securing industrial control systems with a simulation-based verification system","authors":"Dong Jin, Y. Ni","doi":"10.1145/2601381.2601411","DOIUrl":"https://doi.org/10.1145/2601381.2601411","url":null,"abstract":"Today's quality of life is highly dependent on the successful operation of many large-scale industrial control systems. To enhance their protection against cyber-attacks and operational errors, we develop a simulation-based verification framework with cross-layer verification techniques that allow comprehensive analysis of the entire ICS-specific stack, including application, protocol, and network layers.","PeriodicalId":255272,"journal":{"name":"SIGSIM Principles of Advanced Discrete Simulation","volume":"677 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123826914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mesoscopic traffic simulation on CPU/GPU","authors":"Yan Xu, Gary S. H. Tan, Xiaosong Li, Xiao Song","doi":"10.1145/2601381.2601396","DOIUrl":"https://doi.org/10.1145/2601381.2601396","url":null,"abstract":"Mesoscopic traffic simulation is an important branch of technology to support offline large-scale simulation-based traffic planning and online simulation-based traffic management. One of the major concerns using mesoscopic traffic simulations is the performance, which means the required time to simulate a traffic scenario. At the same time, the GPU has recently been a success, because of its massive performance compared to the CPU. Thus, a critical question is \"whether the GPU can be a potential high-performance platform for mesoscopic traffic simulations\"? To the best of our knowledge, there is no clear answer in the research area. In this paper, we firstly propose a comprehensive framework to run a traditional time-stepped mesoscopic traffic simulation on CPU/GPU. Then, we design a boundary processing method to guarantee the correctness of running mesoscopic supply traffic simulations on the GPU. Thirdly, the proposed mesoscopic traffic simulation framework is demonstrated to simulate 100,000 vehicles moving on a large-scale grid road network. In this case study, running a mesoscopic supply traffic simulation on a GPU (GeForce GT 650M) gives 11.2 times speedup, compared with running the same supply simulation on a CPU core (Intel E5-2620). In the end, this paper explains the theoretical limitation of running mesoscopic supply traffic simulations on the GPU. In conclusion, regardless of high system complexity, the proposed mesoscopic traffic simulation framework on CPU/GPU provides an innovative and promising solution for high-performance mesoscopic traffic simulations.","PeriodicalId":255272,"journal":{"name":"SIGSIM Principles of Advanced Discrete Simulation","volume":"57 Pt 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126231839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}