{"title":"[Title page iii]","authors":"Los Alamitos, C. Washington, bullet Tokyo","doi":"10.1109/ancs.2011.2","DOIUrl":"https://doi.org/10.1109/ancs.2011.2","url":null,"abstract":"Presents the title page of the proceedings record.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125159560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Yébenes, J. Escudero-Sahuquillo, P. García, F. Quiles
{"title":"Towards Modeling Interconnection Networks of Exascale Systems with OMNet++","authors":"P. Yébenes, J. Escudero-Sahuquillo, P. García, F. Quiles","doi":"10.1109/PDP.2013.36","DOIUrl":"https://doi.org/10.1109/PDP.2013.36","url":null,"abstract":"One of the objectives of the decade for High-Performance Computing systems is to reach the exascale level of computing power before 2018, hence this will require strong efforts in their design. In that sense, High-speed low-latency interconnection networks are essential elements for exascale HPC systems. Indeed, the performance of the whole system depends on that of the interconnection network. In order to develop and test new techniques, suited to exascale HPC systems, software-based networks simulators are commonly used. As developing a network simulator from scratch is a difficult task, several platforms help the developers, OMNeT++ being one of the most popular. In this paper, we propose a new generic network simulator, exploiting the features of the OMNeT++ framework. The proposed tool is the first step to model HPC high-performance interconnection networks of exascale HPC systems: the message switching layer, routing and arbitration algorithms and buffer organizations have been modeled according to the current and expected characteristics of these systems. In addition, the tool has been designed so that it is possible to simulate networks of large size. Simulation results, validated against real systems, show the accuracy of the model.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121058616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Core Mapping into an Irregular Network on Chip - Features Extraction System for Automatic Speech Recognition Case Study","authors":"P. Dziurzański, T. Maka","doi":"10.1109/PDP.2013.79","DOIUrl":"https://doi.org/10.1109/PDP.2013.79","url":null,"abstract":"In this paper, we propose a mapping scheme of IP cores into irregular Network on Chips using an example module dedicated to features extraction for automatic speech recognition system. We estimated the core sizes for various frame sizes and overlappings, and then tried to place cores communicating heavily close to each other, we test a number of widths in the 2D Rectangular Strip Packing problem. The obtained result range allows us to pick a solution that is beneficial both in terms of area and transfers between the system cores.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114896381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Bach, J. Cuveland, H. Ebermann, D. Eschweiler, J. Gerhard, S. Kalcher, M. Kretz, V. Lindenstruth, H. Ludde, Manfred Pollok, D. Rohr
{"title":"A Comprehensive Approach for a Power Efficient General Purpose Supercomputer","authors":"M. Bach, J. Cuveland, H. Ebermann, D. Eschweiler, J. Gerhard, S. Kalcher, M. Kretz, V. Lindenstruth, H. Ludde, Manfred Pollok, D. Rohr","doi":"10.1109/PDP.2013.55","DOIUrl":"https://doi.org/10.1109/PDP.2013.55","url":null,"abstract":"Computers are essential in research and industry, but they are also significant contributors to the worldwide power consumption. The LOEWE-CSC supercomputer addresses this problem by setting new standards in environmental compatibility as well as energy and cooling efficiency for high-performance and general-purpose computing. Designing a pervasively energy efficient compute center requires improvements in multiple fields. The hosting low-loss compute-center operates at a cooling overhead below 8% of the computer power. General purpose graphics processing units provide more compute performance per watt than standard processors. A balanced hardware configuration ensures that most of the compute power is available to the user when he employs optimized applications. Clever algorithms enable the user to fully exploit the computational potential and avoids to waste power when the processors idles, which is often a cause of inefficient programming. The LOEWE-CSC operated at 740MFlops/W during a Linpack benchmark run, by using commodity servers and ranked place 8 in the Green500 list of November 2010. These innovations provide a fundamental step towards cost-effective, environment-friendly exascale computing and IT operation.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115988341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cost-Efficient Project Management Based on Distributed Processing Model","authors":"Grzegorz Pawinski, K. Sapiecha","doi":"10.1109/PDP.2013.30","DOIUrl":"https://doi.org/10.1109/PDP.2013.30","url":null,"abstract":"In the paper a resource-constrained project scheduling problem (RCPSP) aiming at project cost minimization is investigated. RCPSP is a well-known NP-hard optimization problem. A metaheuristic algorithm was adopted to solve the problem when applied to Critical Chain Project Management (CCPM). It starts with the initial schedule and searches for the cheapest solution satisfying given time constraints. A distributed version of the algorithm is proposed to reduce computation time. Independent processes on remote computers (workers) calculate different schedule modifications in the same time and send results back to a server. The server uses multithreading to distribute project data and search parameters to the workers. The number of workers used to achieve the best performance was estimated. The computational results of distributed processing showed high reduction of time needed to obtain the results, in comparison with centralized processing.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"298 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122793584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos Gómez-Calzado, M. Larrea, Iratxe Soraluze Arriola, A. Lafuente, Roberto Cortiñas
{"title":"An Evaluation of Efficient Leader Election Algorithms for Crash-Recovery Systems","authors":"Carlos Gómez-Calzado, M. Larrea, Iratxe Soraluze Arriola, A. Lafuente, Roberto Cortiñas","doi":"10.1109/PDP.2013.33","DOIUrl":"https://doi.org/10.1109/PDP.2013.33","url":null,"abstract":"This paper presents an evaluation of three communication-efficient algorithms implementing the Omega class of failure detectors, which provides an eventual leader election functionality, in distributed systems where processes can crash and recover. Communication efficiency means that eventually only a correct process, i.e., the elected leader, keeps sending a message periodically to the rest of processes. The first algorithm relies on the use of stable storage to store the identity of the leader and an incarnation number. The second algorithm does not use stable storage, but requires a majority of correct processes. Also, it is near-communication-efficient, since besides the leader, unstable processes, i.e., those that crash and recover infinitely often, may send messages periodically before they receive a message from the leader. Finally, the third algorithm does neither use stable storage nor require a majority of correct processes, but assumes that each process has access to a nondecreasing and persistent local clock. Using the OMNeT++ network simulation framework, we evaluate the performance and the quality of service provided by these algorithms, in terms of the number of messages exchanged among processes and the capability of the failure detector to provide a single leader, respectively.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121388378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pairwise Sequence Alignment Method for Distributed Shared Memory Systems","authors":"Alberto Montañola, C. Roig, P. Hernández","doi":"10.1109/PDP.2013.69","DOIUrl":"https://doi.org/10.1109/PDP.2013.69","url":null,"abstract":"One of the initial key steps of the multiple sequence alignment problem is the pairwise alignment of all pairs of genomic sequences involved. With the increased requirements to align several thousand sequences, it is necessary to find efficient new ways to align as many pairs of sequences as possible. Traditional sequential algorithms are limited by their memory and processing capabilities while parallel implementations running over clusters are able to process considerably more sequences. Nowadays, computer systems are capable of running several processing threads using a shared memory model, which allows us to combine it with distributed memory model. This paper presents a parallel pairwise aligner based on Smith-Waterman capable of processing large numbers of sequences with a small memory footprint. Our implementation is based on the use of a message-passing library such as MPI combined with a threading library, such as pthreads. Our experimentation shows the gain in efficiency for processing different numbers of sequences with different numbers of threads.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127702989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed Iterative Solution of Numerical Simulation Problems on Infiniband and Ethernet Clusters via the P2PSAP Self-Adaptive Protocol","authors":"S. R. Tembo, Nguyen The Tung, D. E. Baz","doi":"10.1109/PDP.2013.25","DOIUrl":"https://doi.org/10.1109/PDP.2013.25","url":null,"abstract":"The distributed iterative solution of numerical simulation problems on Infiniband or Ethernet Clusters via the P2PDC environment is studied. The P2PDC decentralized environment is dedicated to task parallel applications. It has been designed for the solution of large scale numerical simulation problems via distributed iterative algorithms. The P2PDC environment is based on the P2PSAP self-adaptive communication protocol. New functionalities of the P2PSAP communication protocol aimed at using Infiniband clusters are presented. A series of computational results is presented and analyzed.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131211244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Consistency Check through O-GEHL Predictors","authors":"E. Atoofian","doi":"10.1109/PDP.2013.39","DOIUrl":"https://doi.org/10.1109/PDP.2013.39","url":null,"abstract":"Transactional Memory (TM) is a promising paradigm to facilitate parallel programming for multicore processors. In Software implementation of TMs (STMs), transactions rely on a global clock to maintain consistency of transactional data. While this method is simple to implement, it results in significant timing overhead if transactions commit frequently. The alternative approach is Thread Local Clock (TLC) which exploits decentralized local variables to maintain consistency in transactions. However, TLC may increase false aborts and degrade performance of STMs. In this paper, we introduce Adaptive Clock (AC) which dynamically selects one of the two validation techniques based on probability of conflicts. AC is a speculative approach and relies on O-GEHL predictors to speculate future conflicts. We have incorporated AC into TL2 and compared the performance of the new implementation with the original STM using Stamp v0.9.10 benchmark suite. Our results reveal that AC is effective and improves performance of transactional applications up to 33%.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132678683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Petrides, Andreas Diavastos, C. Christofi, P. Trancoso
{"title":"Scalability and Efficiency of Database Queries on Future Many-Core Systems","authors":"P. Petrides, Andreas Diavastos, C. Christofi, P. Trancoso","doi":"10.1109/PDP.2013.14","DOIUrl":"https://doi.org/10.1109/PDP.2013.14","url":null,"abstract":"Decision Support System (DSS) workloads are known to be one of the most time-consuming database workloads that process large data sets. Traditionally, DSS queries have been accelerated using large-scale multiprocessors. In this work we exploit the benefits of using future many-core architectures, more specifically on-chip clustered many-core architectures. To achieve this goal we propose different representative data parallel versions of the original database scan and join algorithms. We also study the impact on the performance when on-chip memory, shared among all cores, is used as a prefetching buffer. For our experiments we study the behaviour of three queries from the standard DSS benchmark TPC-H executing on the Intel Single chip Cloud Computer experimental processor (Intel SCC). Our results show that parallelism can be well exploited by such architectures and how important it is to have a balance between computation and data intensity. Moreover, from our experimental results we show that performance improvement of 5x and 10x for the corresponding query implementation without data prefetching. Finally we show how we could efficiently use the system in order to achieve high power-performance efficiency when using the proposed prefetching buffer.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131293751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}