{"title":"Pseudo-Random Number Generation on GP-GPU","authors":"Jonathan Passerat-Palmbach, C. Mazel, D. Hill","doi":"10.1109/PADS.2011.5936751","DOIUrl":"https://doi.org/10.1109/PADS.2011.5936751","url":null,"abstract":"Random number generation is a key element of stochastic simulations. It has been widely studied for sequential applications purposes, enabling us to reliably use pseudo-random numbers in this case. Unfortunately, we cannot be so enthusiastic when dealing with parallel stochastic simulations. Many applications still neglect random stream parallelization, leading to potentially biased results. Particular parallel execution platforms, such as Graphics Processing Units (GPUs), add their constraints to those of Pseudo-Random Number Generators (PRNGs) used in parallel. It results in a situation where potential biases can be combined to performance drops when parallelization of random streams has not been carried out rigorously. Here, we propose criteria guiding the design of good GPU-enabled PRNGs. We enhance our comments with a study of the techniques aiming to correctly parallelize random streams, in the context of GPU-enabled stochastic simulations.","PeriodicalId":330788,"journal":{"name":"2011 IEEE Workshop on Principles of Advanced and Distributed Simulation","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116936049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Message Clustering for Distributed Agent-Based Systems","authors":"Cole Sherer, Abhishek Gupta, M. Hybinette","doi":"10.1109/PADS.2011.5936756","DOIUrl":"https://doi.org/10.1109/PADS.2011.5936756","url":null,"abstract":"Many agent-based simulation kernels rely on message passing in their core implementation. As the number of agents in a simulation increases or as the complexity of their communication expands the number of messages can increase exponentially. This is troublesome because the message content itself may be quite small, while the overhead, including message headers can dominate bandwidth and processing time. In these cases message passing becomes a bottleneck to scalability. The overhead of message exchange may saturate the network and degrade performance of the simulation. One approach to this challenge that has been investigated in related networking and simulation research centers is combining or \"piggy-backing\" multiple small messages together with a consolidated header. In many applications performance improves as larger, but fewer messages are sent. However, the pattern of message passing is different in the case of agent- based simulation (ABS), and this approach has not yet been explored for ABS systems. In this work we provide an overview of the design and implementation of a message piggy- backing approach for ABS systems using the SASSY platform. SASSY is a hybrid, large-scale distributed ABS system that provides an agent-based API on top of a PDES kernel. We provide a comparative performance evaluation for implementations in SASSY with a combined RMI and shared memory message passing approach. We also show performance of our new adaptive message clustering mechanism that clusters messages when advantageous and avoids clustering when the overhead of clustering dominates.","PeriodicalId":330788,"journal":{"name":"2011 IEEE Workshop on Principles of Advanced and Distributed Simulation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129241270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximation of Dispatching Rules in Manufacturing Control Using Artificial Neural Networks","authors":"S. Bergmann, Sören Stelzer","doi":"10.1109/PADS.2011.5936774","DOIUrl":"https://doi.org/10.1109/PADS.2011.5936774","url":null,"abstract":"Automatic generation of simulation models has been a recurring topic in scientific papers for years. A common problem of all known model generation approaches is the generation of dynamic behavior, e.g. buffering or control strategies. This paper introduces a novel methodology for generation of dynamic behavior, based on artificial neural networks, which is usable directly in the simulation. We also test the approach in a manageable scenario; all results are illustrated via small simulation experiments.","PeriodicalId":330788,"journal":{"name":"2011 IEEE Workshop on Principles of Advanced and Distributed Simulation","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116479050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Empirical Study on Entity Interaction Graph of Large-Scale Parallel Simulations","authors":"Bonan Hou, Yiping Yao, Shaoliang Peng","doi":"10.1109/PADS.2011.5936762","DOIUrl":"https://doi.org/10.1109/PADS.2011.5936762","url":null,"abstract":"The entity interaction graph is an important metaphor for understanding the simulation execution of complex systems on parallel computing environment. Current performance tuning techniques often explore interrelated factors affecting performance, but ignore systematic analysis on the structure and behavior of entity interactions. This paper reports an empirical study on the entity interaction graphs of three systems chosen from different domains: Internet models, molecular dynamics, and social dynamics, respectively. The results of complex networks analysis on the entity interaction graphs demonstrate that the heterogeneous distribution of connections and highly clustering are universal in these complex systems. Generally, these properties are not obvious at the system modeling stage. Moreover, mutual information theory is used to measure the ``principle of persistence\" as the predictability of partitioning on multiple processors. This study facilitates better understanding and quantifying of the interaction complexity and provides implications on performance tuning for parallel simulation of large- scale complex systems.","PeriodicalId":330788,"journal":{"name":"2011 IEEE Workshop on Principles of Advanced and Distributed Simulation","volume":"245 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121707678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Well-Balanced Time Warp System on Multi-Core Environments","authors":"Li-li Chen, Yashuai Lü, Yi-Ping Yao, Shaoliang Peng, Ling-Da Wu","doi":"10.1109/PADS.2011.5936752","DOIUrl":"https://doi.org/10.1109/PADS.2011.5936752","url":null,"abstract":"The current trend in processor architecture design is the integration of multiple cores on a single processor. This trend has shifted the burden of improving program execution speed from chip manufacturers to software developers. Thus, in the software domain, one of the research focuses is on modifying software platforms to efficiently utilize the computation resources of multi-core processors. In this paper, we propose a global schedule mechanism based on a distributed event queue to improve the performance of Time Warp system on multi-core systems and give some experiences on the implementation of the shared attribute/state access mechanism based on transactional space-time memory. Furthermore, this paper comprehensively explores how the different design choices and techniques affect the performance of Time Warp system on a multi-core platform by various experiments. Compared with the distributed event queue local schedule mechanism, the experiment results show that the distributed queue global schedule mechanism can effectively decrease rollback rate and balance the workloads at a low event scheduling cost for Time Warp system on multi-core platforms; the STM-based shared attribute access mechanism prominently outperforms the conventional \"pull\" mechanism on multi-core platforms.","PeriodicalId":330788,"journal":{"name":"2011 IEEE Workshop on Principles of Advanced and Distributed Simulation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130733024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distribution of Random Streams in Stochastic Models in the Age of Multi-Core and Manycore Processors","authors":"D. Hill","doi":"10.1109/PADS.2011.5936759","DOIUrl":"https://doi.org/10.1109/PADS.2011.5936759","url":null,"abstract":"Stochastic Modeling & Simulation require very good random sources which have now been available for more than a decade. However the parallelization of random streams remains delicate for many practitioners. Recent experience shows that a misuse of pseudo-random number parallelization techniques is not infrequent in various simulation domains. This talk will present the state of the art in the distribution of pseudo-random numbers and will also discuss the use of such techniques for hybrid computing with GP-GPUs.","PeriodicalId":330788,"journal":{"name":"2011 IEEE Workshop on Principles of Advanced and Distributed Simulation","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124673010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation","authors":"Ning Liu, C. Carothers","doi":"10.1109/PADS.2011.5936761","DOIUrl":"https://doi.org/10.1109/PADS.2011.5936761","url":null,"abstract":"Exascale supercomputers will have millions or even hundreds of millions of processing cores and the potential for nearly billion-way parallelism. Exascale compute and data storage architectures will be critically dependent on the interconnection network. The most popular interconnection network for current and future supercomputer systems is the torus (e.g., k-ary, n-cube). This paper focuses on the modeling and simulation of ultra-large-scale torus networks using Rensselaer's Optimistic Simulator System (ROSS). We compare real communication delays between our model and the actual torus network from the Blue Gene/L using 2,048 processors. Our performance experiments demonstrate the ability to simulate million to billion-node torus networks. The torus network model for a 16-million-node configuration shows a high degree of strong scaling when going from 1,024 cores to 32,768 cores on Blue Gene/L with a peak event-rate of nearly 5 billion events per second. Finally, we demonstrate the performance of our torus network model configured with 1-billion-nodes using up to 16,384 Blue Gene/L processors.","PeriodicalId":330788,"journal":{"name":"2011 IEEE Workshop on Principles of Advanced and Distributed Simulation","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129679835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trading Computation Time for Synchronization Time in Spatial Distributed Simulation","authors":"R. Zunino","doi":"10.1109/PADS.2011.5936766","DOIUrl":"https://doi.org/10.1109/PADS.2011.5936766","url":null,"abstract":"We consider a class of models describing generic agents (e.g. macromolecules, small organisms) which are able to travel in space, can sense the surrounding environment, and can react accordingly. In these models, we focus on individual-based simulation. We start with defining a simple centralized simulation algorithm, which we then improve so to develop a distributed algorithm producing the same output. An analytical model is given to estimate the expected speedup of our distributed algorithm depending on several parameters. A main aspect of our approach is that it trades computation time for synchronization time. That is, we allow each node to perform apparently redundant computation whenever this reduces the amount of needed synchronization in such a way that the overall performance improves.","PeriodicalId":330788,"journal":{"name":"2011 IEEE Workshop on Principles of Advanced and Distributed Simulation","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131448447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Virtual Time System for OpenVZ-Based Network Emulations","authors":"Yuhao Zheng, D. Nicol","doi":"10.1109/PADS.2011.5936745","DOIUrl":"https://doi.org/10.1109/PADS.2011.5936745","url":null,"abstract":"Simulation and emulation are commonly used to study the behavior of communication networks, owing to the cost and complexity of exploring new ideas on actual networks. Emula-tions executing real code have high functional fidelity, but may not have high temporal fidelity because virtual machines usually use their host's clock. A host serializes the execution of multiple virtual machines, and time-stamps on their interactions reflect this serialization. In this paper we improve temporal fidelity of the OS level virtualization system OpenVZ by giving each virtual machine its own virtual clock. The key idea is to slightly modify the OpenVZ and OpenVZ schedulers so as to measure the time used by virtual machines in computation (as the basis for virtual execution time) and have Linux return virtual times to virtual machines, but ordinary wall clock time to other processes. Our system simulates the functional and temporal behavior of the communication network between emulated processes, and con-trols advancement of virtual time throughout the system. We evaluate our system against a baseline of actual wireless network measurements, and observe high temporal accuracy. Moreover, we show that the implementation overhead of our system is as low as 3%. Our results show that it is possible to have a network simulator driven by real workloads that gives its emulated hosts temporal accuracy.","PeriodicalId":330788,"journal":{"name":"2011 IEEE Workshop on Principles of Advanced and Distributed Simulation","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134395105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giulia Pedrielli, Paola Scavardone, T. Tolio, M. Sacco, W. Terkaj
{"title":"Simulation of Complex Manufacturing Systems via HLA-Based Infrastructure","authors":"Giulia Pedrielli, Paola Scavardone, T. Tolio, M. Sacco, W. Terkaj","doi":"10.1109/PADS.2011.5936772","DOIUrl":"https://doi.org/10.1109/PADS.2011.5936772","url":null,"abstract":"Manufacturing systems can be thought as production networks nodes whose relations have a strong impact on design and analysis of each system. One of the most common techniques to support these tasks is Discrete-Event Simulation. The state-of-the art commercial simulators are already adopted to analyze complex networked systems, but the development of a monolithic simulation model can be too complex or even infeasible when a detailed description of the nodes is not available outside the \"owner\" of the node. In these cases the problem can be decomposed by modeling complex systems with various simulators that interoperate in a synchronized manner. Herein, the integration of simulators is addressed by taking as a reference the High Level Architecture (HLA) and the research carried out by Commercial-off-the-shelf Simulation Package Interoperability (CSPI) Product Development Group (PDG). This paper proposes modifications to CSPI-PDG protocols and to use patterns of how HLA can be effectively adopted to support CSP interoperability: a new solution for the synchronous entity passing problem and a modification to the Entity Transfer Specification are presented. The resulting infrastructure is validated and tested over a realistic industrial case.","PeriodicalId":330788,"journal":{"name":"2011 IEEE Workshop on Principles of Advanced and Distributed Simulation","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123737977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}