{"title":"On Determining Multiple Optimal Parenthesizations for Matrix Chain Products and Scheduling the Corresponding Task Graphs","authors":"Khaoula Bezzina, Bchira Ben Mabrouk, Z. Mahjoub","doi":"10.1109/HPCS.2017.61","DOIUrl":"https://doi.org/10.1109/HPCS.2017.61","url":null,"abstract":"We are interested in an easy combinatorial optimization problem having several applications in the real world, namely the matrix chain product problem that may be solved by a well known dynamic programming algorithm (DPA). Our contribution is two-fold. It first consists in the design of an approach based on the DPA for the determination of multiple optimal solutions i.e. optimal parenthesizations (OPs) which may be represented by binary in-trees (BITs). Since our aim is to efficiently parallelize the computation of the resulting product matrix, we define for this purpose a particular inter-OPs comparative criterion i.e. the cost of a critical path in each BIT. Afterwards, we design different schedulings for the corresponding BIT task graphs (TGs). The procedure begins by the construction of the earliest and latest level partitions (LPs) of the TG. Then, after choosing three granularity sizes, i.e coarse, medium and mixed coarse-medium grains, and given an arbitrary number of processors, the proposed schedulings follow a level-per-level scanning of LPs of the TG. In addition to the design of efficient schedulings, we also determine the minimum number of processors to schedule the TG in minimal time. Our contribution is validated by an experimental study achieved on a series of input data (chain lengths, matrix sizes and number of processors) permitting to establish fine comparisons between the determined OPs and the designed schedulings, thus illustrating the practical interest of the study.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121650337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A First Investigation on the Dynamics of Two Delayed Neurons through Fuzzy Transform Approximation","authors":"S. Tomasiello","doi":"10.1109/HPCS.2017.74","DOIUrl":"https://doi.org/10.1109/HPCS.2017.74","url":null,"abstract":"Fuzzy transform is a promising approximation technique. In this paper, we use it to approximate a delayed function in a two-neuron system. We formally study the effect of this approximation, by elucidating the dynamics of the resulting discrete system. A linear stability analysis has been performed, with a first investigation on the possible bifurcations. Hopf bifurcation can occur under certain conditions.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116288796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Evaluation of a Parallel Dynamic Programming Algorithm for Solving the 1D Array Partitioning Problem","authors":"H. Salhi, Bchira Ben Mabrouk, Z. Mahjoub","doi":"10.1109/HPCS.2017.59","DOIUrl":"https://doi.org/10.1109/HPCS.2017.59","url":null,"abstract":"We address the 1D array partitioning problem (1D- APP), an easy combinatorial optimization problem, for which an exact dynamic programming algorithm (DPA) is known in the literature. The DPA is structured in a perfect three DO-loop nest (3DLN) with affine loop bounds. Due to its cubic complexity which may be too time consuming for large size real world problems, we propose a parallelization approach (PA). The latter starts by a dependence analysis within the nest (presented in a previous work) permitting to derive several versions of the original DPA then keep the (theoretically) best one. Considering this latter, a 3DLN, our contribution detailed here first consists in choosing two task segmentations corresponding to two grain sizes i.e. fine (resp. medium) grain where a grain corresponds to the body of the third (resp. second) loop of the 3DLN. Afterwards, we construct particular level decompositions (LDs) of the corresponding layered task graphs and design, when an arbitrary number of processors is available, several schedulings (4 in the fine grain case and 2 in the medium grain case) based on scanning the levels of the LDs with and without inter-level overlapping. For each case the makespans of the schedulings are explicitly determined and analysed. Our theoretical contribution is validated through a series of simulations achieved on different input data and for different numbers of available processors. This permits to establish a fine comparison between the different scheduling thus showing their respective efficiencies.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126876778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CUDA Based Parallel Implementations of Space-Saving on a GPU","authors":"M. Cafaro, I. Epicoco, G. Aloisio, Marco Pulimeno","doi":"10.1109/HPCS.2017.108","DOIUrl":"https://doi.org/10.1109/HPCS.2017.108","url":null,"abstract":"We present four CUDA based parallel implementations of the Space-Saving algorithm for determining frequent items on a GPU. The first variant exploits the open-source CUB library to simplify the implementation of a user's defined reduction, whilst the second is based on our own implementation of the parallel reduction. The third and the fourth, built on the previous variants, are meant to improve the performance by taking advantage of hardware based atomic instructions. In particular, we implement a warp based ballot mechanism to accelerate the Space-Saving updates. We show that our implementation of the parallel reduction, coupled with the ballot based update mechanism, is the fastest, and provides extensive experimental results regarding its performance.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126408060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extending OmpSs to Support Data Analytics Workload","authors":"Marcos Maroñas","doi":"10.1109/HPCS.2017.136","DOIUrl":"https://doi.org/10.1109/HPCS.2017.136","url":null,"abstract":"In the era of big data, new scientific applications such as those used in astronomy [1] are emerging and challenging High Performance Computing (HPC) systems and software. Traditionally, HPC applications were compute-bounded, with a light use of the I/O capabilites at the start and end of the execution. In contrast, emergent applications present data- intensive behaviors arising several new challenges to be faced by hardware and software.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127650105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"When is the Right Time to Start the Fault Tolerance Protection?","authors":"Jorge Villamayor, Dolores Rexachs, E. Luque","doi":"10.1109/HPCS.2017.70","DOIUrl":"https://doi.org/10.1109/HPCS.2017.70","url":null,"abstract":"In High Performance Computing, Fault Tolerance (FT) becomes a primary concern due to the constant growing and continuous aging of hardware components, which rise failures probability. Failures produce performance degradation to the environment and affect significantly users expected execution time. Rollback-Recovery protocols represent a fundamental component to protect and restore users parallel application execution, although this protection comes with an overhead. This paper proposes a First Protection Point model, which determines the starting point to introduce FT protection gaining benefits in terms of total execution time including failures. A characterization of Rollback-Recovery protocols applied on parallel applications is performed, to obtain key factors for the model design. This model can help users determine which checkpoints can be removed from the application execution when they are used for FT protection purposes, reducing the overhead and at the same time keeping high availability. An analytic model evaluation is developed to show the inflexion point where FT protection starts to provide benefits for users. Finally, three experimental environments are setup, using two private clusters and a public cluster configured in a well-known cloud Amazon EC2. A coordinated checkpoint facility is applied on NAS benchmark applications such as: CG, BT and LU to evaluate the proposed model, obtaining overhead impact reduction for provided Fault Tolerance.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133675071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using the Application Signature to Detect Inefficiencies Generated by Mapping Policies in Parallel Applications","authors":"C. Rangel, Alvaro Wong, Dolores Rexachs, E. Luque","doi":"10.1109/HPCS.2017.85","DOIUrl":"https://doi.org/10.1109/HPCS.2017.85","url":null,"abstract":"The execution of HPC applications in multicore environments can occasionally use the resources in an inefficient way. There are idle times during the application execution that can be caused by synchronization or message passing collisions. We define this idle time as an application inefficiency and may be caused by the message passing collisions at different types of interconnections in the compute nodes. We propose a methodology to characterize the application's execution in order to analyze and detect these inefficiencies in a bounded time as well as to locate on which parallel segments of the application code (phases) these inefficiencies are generated. The parallel segments of code (phases) represent the most relevant application behavior and are obtained by the application's characterization using the PAS2P tool. The tool allows us to predict the execution time by the generation of the application signature, which is composed of phases. Taking advantage of the prediction quality and the time to obtain the prediction of application performance, we propose modeling the factors that potentially influence the application's execution time, especially characterizing the behavior during the execution time of these phases. We performed experimental validation using signatures of NAS Parallel benchmarks in order to detect and model the inefficiencies in the application phases.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130427474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Deployment System for Highly Heterogeneous and Dynamic Environments","authors":"Leila Abidi, C. Cérin, W. Saad","doi":"10.1109/HPCS.2017.98","DOIUrl":"https://doi.org/10.1109/HPCS.2017.98","url":null,"abstract":"In this paper we introduce the scientific issues related to deploying, in a multi-Cloud architecture, an infrastructure using the publish-subscribe paradigm for orchestrating the components of a framework that execute scientific workflows in highly heterogeneous and dynamic environments. More specifically we are in search of the adequate approaches to build deployment systems for heterogeneous and highly dynamic environments. As the use case for this paper, we focus on the deployment of the RedisDG workflow engine. This insight serves to make concrete our ideas, and we propose an architecture for the deployment system as well as different scenarios and demonstrations of this scientific workflow engine, deployed 'as a Service'. This paper can be considered as the first return from our experience in deploying, automatically, an IaaS (Infrastructure as a Service) for executing scientific workflows, on-demand for heterogeneous and dynamic environments.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130674930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling a Photonic Network for Exascale Computing","authors":"José Duro, S. Petit, J. Sahuquillo, M. E. Gómez","doi":"10.1109/HPCS.2017.82","DOIUrl":"https://doi.org/10.1109/HPCS.2017.82","url":null,"abstract":"Photonics technology has become a promising and viable alternative for both on-chip and off-chip computer networks of future Exascale systems. Nevertheless, this technology is not mature enough yet in this context, so research efforts focusing on photonic networks are still required to achieve realistic suitable network implementations. In this context, system-level photonic network simulators can help to guide designers to assess the multiple design choices. Most current research is done on electrical network simulators, whose components work widely different from photonics components. Moreover, photonics technology adds new components that are not present in electrical networks. This paper discusses how a photonics simulation tool can be built by extending an electrical simulation framework. We summarize and compare the working behavior of both technologies -electrical and photonics, and discuss the rationale behind the proposed extensions. Among others, the devised extensions model optical routers, wavelength-division multiplexing, circuit switching, and specific routing algorithms. This work is aimed to provide support to investigate off- chip optical networks in the context of the European Exascale System Interconnect and Storage project (ExaNeSt) project. The experiments presented in this paper study multiple realistic photonic networks configurations and have been performed with excerpts of real traces. Experimental results show that, compared to electrical networks, optical networks can reduce the execution time of the workload by several orders of magnitude. Our study reveals that future optical technologies presenting a 3.2 Tbps aggregate link bandwidth will not provide additional performance benefits over state-of-the-art 1.6 Tbps optical links across the studied workloads, but 1.6 Tbps network links are enough to achieve the highest optical performance on computer networks. Regarding the link configuration, the bandwidth per optical channel is the parameter with highest impact on the network delay and so on the execution time, while for a given optical bandwidth per channel the better strategy is to reduce the phit size.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116600601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Benzi, Luca Biferale, F. Bonaccorso, H. Clercx, Alessandro Corbetta, W. Mobius, F. Toschi, F. Salvadore, C. Cacciari, G. Erbacci
{"title":"TurBase: A Software Platform for Research in Experimental and Numerical Fluid Dynamics","authors":"R. Benzi, Luca Biferale, F. Bonaccorso, H. Clercx, Alessandro Corbetta, W. Mobius, F. Toschi, F. Salvadore, C. Cacciari, G. Erbacci","doi":"10.1109/HPCS.2017.18","DOIUrl":"https://doi.org/10.1109/HPCS.2017.18","url":null,"abstract":"We present a software infrastructure for the research community working on turbulence and complex flows (TurBase), an easily accessible web platform for high quality data. Its main goal is to host, standardize and manage a large collections of heterogeneous experimental and numerical data sets from high-end European fluid dynamics experimental facilities and from High Performance Computational centres. TurBase offers scalable performances when accessing/uploading/searching data, providing at the same time maximum flexibility and power (through Jupyter notebooks) when doing online computation directly on big datasets","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131286504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}