Diego F. Bermúdez Garzón, Crispín Gómez Requena, P. López, M. E. Gómez
{"title":"Speeding-up the fault-tolerance analysis of interconnection networks","authors":"Diego F. Bermúdez Garzón, Crispín Gómez Requena, P. López, M. E. Gómez","doi":"10.1109/HPCSim.2015.7237035","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237035","url":null,"abstract":"Analyzing the fault-tolerance of interconnection networks implies checking the connectivity of each source-destination pair. The size of the exploration space of such operation skyrockets with the network size and with the number of link faults. However, this problem is highly parallelizable since the exploration of each path between a source-destination pair is independent of the other paths. This paper presents an approach to analyze the fault-tolerance degree of multistage interconnection networks using GPUs in order to speed-up it. This approach uses CUDA as parallel programming tool on a GPU in order to take advantage of all available cores. Results show that the execution time of the fault-tolerance exploration can be significantly reduced.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129899834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Reduced Complexity Instruction Set architecture for low cost embedded processors","authors":"Hanni B. Lozano, M. Ito","doi":"10.1109/HPCSim.2015.7237068","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237068","url":null,"abstract":"Implementing advanced DSP applications in software on a low power and low cost embedded RISC processors is a challenging task because of ISA shortcomings that inhibits performance. An embedded CISC processor can potentially deliver higher performance but not enough to meet the demand of complex DSP applications. We present a novel ISA that eliminates unnecessary overheads and speeds up the performance of embedded DSP applications on resource constrained processors. The implementation of the novel mixed ISA requires minor modification to the base architecture which translates to less than 5% increase in total power consumption. The novel ISA reduces the number of instructions used to implement a complex Fast Fourier Transform by less than half and speeds the processing by three folds leading to a substantial improvement in energy efficiency. Simulation results of a number of embedded benchmark programs show an average two fold increase in performance compared to a RISC processor.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127610487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Fernández, Diego Scardaci, G. Sipos, D. Wallom, Yin Chen
{"title":"The user support programme and the training infrastructure of the EGI Federated Cloud","authors":"E. Fernández, Diego Scardaci, G. Sipos, D. Wallom, Yin Chen","doi":"10.1109/HPCSim.2015.7237016","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237016","url":null,"abstract":"The EGI Federated Cloud is a standards-based, open cloud system as well as its enabling technologies that federates institutional clouds to offer a scalable computing platform for data and/or compute driven applications and services. The EGI Federated Cloud is based on open standards and open source Cloud Management Frameworks and offers to its users IaaS, PaaS and SaaS capabilities and interfaces tuned towards the needs of users in research and education. The federation enables scientific data, workloads, simulations and services to span across multiple administrative locations, allowing researchers and educators to access and exploit the distributed resources as an integrated system. The EGI Federated Cloud collaboration established a user support model and a training infrastructure to raise visibility of this service within European scientific communities with the overarching goal to increase adoption and, ultimately increase the usage of e-infrastructures for the benefit of the whole European Research Area. The paper describes this scalable user support and training infrastructure models. The training infrastructure is built on top of the production sites to reduce costs and increase its sustainability. Appropriate design solutions were implemented to reduce the security risks due to the cohabitation of production and training resources on the same sites. The EGI Federated Cloud educational program foresees different kind of training events from basic tutorials to spread the knowledge of this new infrastructure to events devoted to specific scientific disciplines teaching how to use tools already integrated in the infrastructure with the assistance of experts identified in the EGI community. The main success metric of this educational program is the number of researchers willing to try the Federated Cloud, which are steered into the EGI world by the EGI Federated Cloud Support Team through a formal process that brings them from the initial tests to fully exploit the production resources.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115448379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient implementation of fuzzy edge detection using GPU in MATLAB","authors":"F. Hoseini, A. Shahbahrami","doi":"10.1109/HPCSim.2015.7237100","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237100","url":null,"abstract":"Edge detection is one of the most important concepts in image processing which is used as an indicator for processing and extraction of some of border characteristics at low levels, also for detection and finding objects at high levels. Due to the inherently parallel nature of edge detection algorithms, they suit well for implementation on a Graphics Processing Unit (GPU). First part of this paper aims to detect and retouch image edges using fuzzy inference system. In the first step RGB images converted to gray scale images. In the second step the input images are converted from unit 8 class to double class. In the third step, fuzzy inference system is defined with two inputs. Fuzzy inference system rules and membership function are applied on these two inputs. The output with black pixels indicates areas with edge and the output with white pixels indicates areas without edge. The second part of this paper, the performance of fuzzy edge detection algorithm is improved using GPU platform by exploiting data-level parallelism and scatter/gather parallel communication pattern in Matlab environment. The experimental results show that the performance is improved for different image sizes of up to 11.8x.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132556709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large Java arrays and their applications","authors":"Piotr Wendykier, B. Borucki, K. Nowinski","doi":"10.1109/HPCSim.2015.7237077","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237077","url":null,"abstract":"All current implementations of Java Virtual Machines allow the creation of one-dimensional arrays of length smaller than 231 elements. In addition, since Java lacks true multidimensional arrays, most of numerical libraries use one-dimensional arrays to store multidimensional data. With the current limitation, it is not possible to store volumes of size larger than 12903. On the other hand, the data from scientific simulations or medical scanners continuously grow in size and it is not uncommon to go beyond that limit. This work addresses the problem of maximal size of one-dimensional Java arrays. JLargeArrays is a Java library of one-dimensional arrays that can store up to 263 elements. Performance comparison with native Java arrays and Fastutil library shows that JLargeArrays is the fastest solution overall. Possible applications in Java collections as well as numerical and visualization frameworks are also discussed.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"58 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131294042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing communications in multi-GPU Lattice Boltzmann simulations","authors":"E. Calore, D. Marchi, S. Schifano, R. Tripiccione","doi":"10.1109/HPCSim.2015.7237021","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237021","url":null,"abstract":"An increasingly large number of scientific applications run on large clusters based on GPU systems. In most cases the large scale parallelism of the applications uses MPI, widely recognized as the de-facto standard for building parallel applications, while several programming languages are used to express the parallelism available in the application and map it onto the parallel resources available on GPUs. Regular grids and stencil codes are used in a subset of these applications, often corresponding to computational “Grand Challenges”. One such class of applications are Lattice Boltzmann Methods (LB) used in computational fluid dynamics. The regular structure of LB algorithms makes them suitable for processor architectures with a large degree of parallelism like GPUs. Scalability of these applications on large clusters requires a careful design of processor-to-processor data communications, exploiting all possibilities to overlap communication and computation. This paper looks at these issues, considering as a use case a state-of-the-art two-dimensional LB model, that accurately reproduces the thermo-hydrodynamics of a 2D-fluid obeying the equation-of-state of a perfect gas. We study in details the interplay between data organization and data layout, data-communication options and overlapping of communication and computation. We derive partial models of some performance features and compare with experimental results for production-grade codes that we run on a large cluster of GPUs.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"49 17","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120815416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cosimo Palazzo, Andrea Mariello, S. Fiore, Alessandro D'Anca, D. Elia, Dean N. Williams, G. Aloisio
{"title":"A workflow-enabled big data analytics software stack for escience","authors":"Cosimo Palazzo, Andrea Mariello, S. Fiore, Alessandro D'Anca, D. Elia, Dean N. Williams, G. Aloisio","doi":"10.1109/HPCSim.2015.7237088","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237088","url":null,"abstract":"The availability of systems able to process and analyse big amount of data has boosted scientific advances in several fields. Workflows provide an effective tool to define and manage large sets of processing tasks. In the big data analytics area, the Ophidia project provides a cross-domain big data analytics framework for the analysis of scientific, multi-dimensional datasets. The framework exploits a server-side, declarative, parallel approach for data analysis and mining. It also features a complete workflow management system to support the execution of complex scientific data analysis, schedule tasks submission, manage operators dependencies and monitor jobs execution. The workflow management engine allows users to perform a coordinated execution of multiple data analytics operators (both single and massive - parameter sweep) in an effective manner. For the definition of the big data analytics workflow, a JSON schema has been properly designed and implemented. To aid the definition of the workflows, a visual design language consisting of several symbols, named Data Analytics Workflow Modelling Language (DAWML), has been also defined.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128911291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying patterns towards Algorithm Based Fault Tolerance","authors":"U. Kabir, D. Goswami","doi":"10.1109/HPCSim.2015.7237083","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237083","url":null,"abstract":"Checkpoint and recovery cost imposed by coordinated checkpoint/restart (CCP/R) is a crucial performance issue for high performance computing (HPC) applications. In comparison, Algorithm Based Fault Tolerance (ABFT) is a promising fault tolerance method with low recovery overhead, but it suffers from inadequacy of universal applicability and user non-transparency. In this paper we address the overhead problem of CCP/R and some of the limitations of ABFT, and propose a solution for ABFT based on algorithmic patterns. The proposed solution is a generic fault tolerance strategy for a group of applications that exhibit similar algorithmic (structural and behavioral) features. These features together with the minimal fault recovery data (critical data) determine the fault tolerance strategy for the group of applications. We call this strategy a fault tolerance pattern (FTP). We demonstrate the idea of FTP with parallel iterative deepening A* (PIDA*) search, a generic search algorithm used to solve a wide range of discrete optimization problems (DOP). Theoretical analysis shows that our proposed solution performs better than CCP/R in terms of checkpoint and recovery time overhead. Furthermore, using FTP helps in separation of concerns, which facilitates user transparency.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127818255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance evaluation and improvement in cloud computing environment","authors":"Omar Khedher, M. Jarraya","doi":"10.1109/HPCSim.2015.7237109","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237109","url":null,"abstract":"Cloud computing covers a wide range of applications, from online services for the end user. It becomes the new trends for most organizations to handle their business IT units. Services provided are becoming flexible because the resources and processing power available to each can be adjusted on the fly to meet changes in need [6]. However, infrastructures deployed on a cloud computing environment may induce significant performance penalties for the demanding computing workload. In our doctoral research, we aim to study, analyze, evaluate and improve performance in cloud computing environment based on different criteria. To achieve the thesis objectives, a research performed is based on a quantitative analysis of repeatable empirical experiment.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124663699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A resilient routing approach for Mobile Ad Hoc Networks","authors":"Ming-Yang Su, Chih-Wei Yang","doi":"10.1109/HPCSim.2015.7237102","DOIUrl":"https://doi.org/10.1109/HPCSim.2015.7237102","url":null,"abstract":"This paper presents a resilient routing algorithm that is more suitable for a Mobile Ad Hoc Network (MANET) with node quick moving or node sparse, than traditional routing algorithms such as Ad-hoc On-demand Distance Vector (AODV) routing. Since AODV routing in MANETs is known for its advantageous properties, the proposed routing algorithm is based on the AODV, and called RAODV (Resilient AODV). In the route discovery phase, it differs from AODV, which only establishes one routing path from a source node to the destination node, whereas RAODV establishes as many routes as possible. Thus, when the primary route breaks, the node can immediately adopt an alternative route without further route research effort. If no possible alternative route exists, the node will transmit the route break information backward to instruct the previous node on the reverse route to select an alternative one, and so on. The proposed RAODV can reduce the number of route rediscovery procedures, and thus improve the packet loss rate and transmission delay, especially in sparse MANETs. Simulator ns2 was used to evaluate the performance of RAODV. In some cases, the proposed RAODV was able to reduce the packet loss rate by 72.61% compared to traditional AODV.","PeriodicalId":134009,"journal":{"name":"2015 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117237854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}