{"title":"Massively Concurrent Red-Black Trees with Hardware Transactional Memory","authors":"D. Siakavaras, K. Nikas, G. Goumas, N. Koziris","doi":"10.1109/PDP.2016.65","DOIUrl":"https://doi.org/10.1109/PDP.2016.65","url":null,"abstract":"Hardware Transactional Memory (HTM) is nowadays available in several commercial and HPC targeted processors and in the future it will likely be available on systems that can accommodate a very large number of threads. Thus, it is essential for the research community to target on evaluating HTM on as many cores as possible in order to understand the virtues and limitations that come with it. In this paper we utilize HTM to parallelize accesses on a classic data structure, a red-black tree. With minimal programming effort, we implement a red-black tree by enclosing each operation in a single HTM transaction and evaluate it on two servers equipped with Intel Haswell-EP and IBM Power8 processors, supporting a large number of hardware threads, namely 56 and 160 respectively. Our evaluation reveals that applying HTM in such a simplistic manner allows scalability for up to a limited number of hardware threads. To fully utilize the underlying hardware we apply different optimizations on each platform.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123448837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"S4 Applications Simulator for Performance Evaluation","authors":"Rafael Soto Gallardo, C. Bonacic, Mauricio Marín","doi":"10.1109/PDP.2016.128","DOIUrl":"https://doi.org/10.1109/PDP.2016.128","url":null,"abstract":"Big Data is now a widely studied concept in the field of massive processing of information, but testing systems and applications in this field is a difficult task, because the real environments to be used are of a large scale and many times impossible to reproduce. Usually, for this task executions in virtual settings, called simulations, are used. The present article presents a simulator for applications developed to be executed on the Apache S4 distributed computing platform for multiple hardware and software scenarios. A version in which mobile devices as well as processing units are used is proposed. The results show that the simulator without mobile devices gets good prediction performance results considering the processing and communication times between the elements of the S4 applications, and the other version with mobile systems shows a decrease in performance in different simulation configurations. However, adding replication in the elements that are communicated with the cellphones has shown substantial improvements in the application's performance.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"293 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123743407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Carillo, G. Cordasco, Flavio Serrapica, V. Scarano, Carmine Spagnuolo, Przemysław Szufel
{"title":"SOF: Zero Configuration Simulation Optimization Framework on the Cloud","authors":"M. Carillo, G. Cordasco, Flavio Serrapica, V. Scarano, Carmine Spagnuolo, Przemysław Szufel","doi":"10.1109/PDP.2016.22","DOIUrl":"https://doi.org/10.1109/PDP.2016.22","url":null,"abstract":"Simulation models are becoming an increasingly popular tool for the analysis and optimization of complex real systems in different fields. Finding an optimal system design requires performing a large parameter sweep. In this paper, we present the design of SOF (Simulation Optimization and exploration Framework on the cloud), a framework which exploits the computing power of a cloud computational environment in order to realize effective and efficient simulation optimization strategies. SOF offers several attractive features: SOF requires \"zero configuration\" as it does not require any additional software installed on the remote node, SOF is transparent to the user, since the user is totally unaware that system operates on a distributed environment, SOF is highly customizable and programmable, since it enables the running of different simulation optimization scenarios on different simulation toolkits. The tool has been fully developed and is available on a public repository under the Apache public licence.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129026078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Sadrosadati, Ramin Bashizade, Shahin Roozkhosh, Ali Shafiee, H. Sarbazi-Azad
{"title":"A Method to Improve Adaptivity of Odd-Even Routing Algorithm in Mesh NoCs","authors":"Mohammad Sadrosadati, Ramin Bashizade, Shahin Roozkhosh, Ali Shafiee, H. Sarbazi-Azad","doi":"10.1109/PDP.2016.61","DOIUrl":"https://doi.org/10.1109/PDP.2016.61","url":null,"abstract":"Adaptive routing algorithms help balancing the resource utilization in different parts of the network and hence, prevent a resource becoming the performance bottleneck while other resources are still under-utilized. In this paper, we present a novel approach, called Preemptive Waiting, which is applied to Odd-Even routing algorithm (PWOE). PWOE postpones the saturation traffic rate of NoC by 13.4% compared to OE, under synthetic traffic loads.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126840862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessing and Managing Risk by Simulating Attack Chains","authors":"F. Baiardi, F. Tonelli, A. D. R. D. Biase","doi":"10.1109/PDP.2016.50","DOIUrl":"https://doi.org/10.1109/PDP.2016.50","url":null,"abstract":"Haruspex is a suite of tools to assess and manage the risk posed by an information and communication technology system. The suite is built around the application of a Monte Carlo method to a scenario where intelligent agents implement chains of attacks to reach their goals. Some tools build a description of the agents, the target system, its vulnerabilities and the resulting attacks. Another tool applies a Monte Carlo method to this description, simulates the building of attack chains by the agents and it returns a database with samples it collects in the simulations. Further tools analyze this database to select countermeasures. To validate the suite and verify it truthfully models attackers, it has been adopted in Locked Shield 2014, a network defense exercise with participants from 17 nations. The results of this exercise validate the designs of the tools.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126223700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thireshan Jeyakumaran, E. Atoofian, Yang Xiao, Zhen Li, A. Jannesari
{"title":"Improving Performance of Transactional Applications through Adaptive Transactional Memory","authors":"Thireshan Jeyakumaran, E. Atoofian, Yang Xiao, Zhen Li, A. Jannesari","doi":"10.1109/PDP.2016.85","DOIUrl":"https://doi.org/10.1109/PDP.2016.85","url":null,"abstract":"Transactional memory (TM) has become progressively widespread especially with hardware transactional memory implementation becoming increasingly available. In this paper, we focus on Restricted Transactional Memory (RTM) in Intel's Haswell processor and show that performance of RTM varies across applications. While RTM enhances performance of some applications relative to software transactional memory (STM), in some others, it degrades performance. We exploit this variability and present an adaptive system which is a static approach that switches between HTM and STM in transaction granularity. By incorporating a decision tree prediction module, we are able to predict the optimum TM system for a given transaction based on its characteristics. Our adaptive system supports both HTM and STM with the aim of increasing an application's performance. We show that our adaptive system has an average overall speedup of 20.82% over both TM systems.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125469168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benard Xypolitidis, R. Shabani, Satej V. Khandeparkar, Zain-ul-Abdin, Süleyman Savas, T. Nordström
{"title":"Towards Architectural Design Space Exploration for Heterogeneous Manycores","authors":"Benard Xypolitidis, R. Shabani, Satej V. Khandeparkar, Zain-ul-Abdin, Süleyman Savas, T. Nordström","doi":"10.1109/PDP.2016.79","DOIUrl":"https://doi.org/10.1109/PDP.2016.79","url":null,"abstract":"Today many of the high performance embedded processors already contain multiple processor cores and we see heterogeneous manycore architectures being proposed. Therefore it is very desirable to have a fast way to explore various heterogeneous architectures through the use of an architectural design space exploration tool, giving the designer the option to explore design alternatives before the physical implementation. In this paper, we have extended Heracles, a design space exploration tool for (homogeneous) manycore architectures, to incorporate different types of processing cores, and thus allow us to model heterogeneity. Our tool, called the Heterogeneous Heracles System (HHS), can besides the already supported MIPS core also include OpenRISC cores. The new tool retains the possibility available in Heracles to perform register transfer level (RTL) simulations of each explored architecture in Verilog as well as synthesizing it to field-programmable gate arrays (FPGAs). To facilitate the exploration of heterogeneous architectures, we have also extended the graphical user interface (GUI) to support heterogeneity. This GUI provides options to configure the types of core, core settings, memory system and network topology. Some initial results on FPGA utilization are presented from synthesizing both homogeneous and heterogeneous manycore architectures, as well as some benchmark results from both simulated and synthesized architectures.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130796058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing and Improving Memory Access Patterns of Large Irregular Applications on NUMA Machines","authors":"Artur Mariano, M. Diener, C. Bischof, P. Navaux","doi":"10.1109/PDP.2016.37","DOIUrl":"https://doi.org/10.1109/PDP.2016.37","url":null,"abstract":"Improving the memory access behavior of parallel applications is one of the most important challenges in high-performance computing. Non-Uniform Memory Access (NUMA) architectures pose particular challenges in this context: they contain multiple memory controllers and the selection of a controller to serve a page request influences the overall locality and balance of memory accesses, which in turn affect performance. In this paper, we analyze and improve the memory access pattern and overall memory usage of large-scale irregular applications on NUMA machines. We selected HashSieve, a very important algorithm in the context of lattice-based cryptography, as a representative example, due to (1) its extremely irregular memory pattern, (2) large memory requirements and (3) unsuitability to other computer architectures, such as GPUs. We optimize HashSieve with a variety of techniques, focusing both on the algorithm itself as well as the mapping of memory pages to NUMA nodes, achieving a speedup of over 2x.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130983943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GPU Acceleration of Smoothed Particle Hydrodynamics for the Navier-Stokes Equations","authors":"Yingrui Wang, Leisheng Li, Jingtao Wang, R. Tian","doi":"10.1109/PDP.2016.28","DOIUrl":"https://doi.org/10.1109/PDP.2016.28","url":null,"abstract":"Although there exist much work on GPU acceleration on the SPH method, the focus so far has been on the Euler equations in fluid mechanics. This paper presents GPU acceleration on the SPH method for the Navier-Stokes equations for both solid and fluid mechanics. We investigate and compare three CPU-GPU coupling models in terms of one large-scale parallel application code: (1) CPU?GPU (to only run hotspots on GPU), (2) GPU-alone (to run the whole of simulation on GPU), and (3) CPU||GPU (to treat CPU and GPU as equivalent processors). A common issue to the three models, \"easy code transplant onto GPU\", is emphasized. Optimizations on particle indexing and particle interaction on GPU, which are of unique importance to a SPH code, are addressed. Numerical experiments are finally performed and 4x, 10x, 16x speedups are observed for the three coupling models, respectively, with reference to single CPU core. Among the three, the fastest model -- Xthe \"CPU||GPU\" model -- Xfurther undergoes scalability tests on a cluster of 6 heterogeneous nodes and shows 90+% parallel efficiency.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127064191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. M. Real, Vincent Migliore, Vianney Lapôtre, G. Gogniat
{"title":"ALMOS Many-Core Operating System Extension with New Secure-Enable Mechanisms for Dynamic Creation of Secure Zones","authors":"M. M. Real, Vincent Migliore, Vianney Lapôtre, G. Gogniat","doi":"10.1109/PDP.2016.92","DOIUrl":"https://doi.org/10.1109/PDP.2016.92","url":null,"abstract":"Many-core architectures are becoming a major execution platform in order to face the increasing number of applications to be executed in parallel. Such an approach is very attractive in order to offer users with high performance. However it introduces some key challenges in terms of security as some malicious applications may compromise the whole system. A defense-in-depth approach relying on hardware and software mechanisms is thus mandatory to increase the level of protection. This work focuses on the Operating System (OS) level and proposes a set of operating system services able to dynamically create physical isolated secure zones for sensitive applications in many-core platforms. These services are integrated into the ALMOS OS deployed in the TSAR many-core architecture, and evaluated in terms of security level and induced performance overhead.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133799344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}