E. Mosca, I. Merelli, L. Milanesi, A. Clematis, D. D'Agostino
{"title":"A Parallel Implementation of the Stau-DPP Stochastic Simulator for the Modelling of Biological Systems","authors":"E. Mosca, I. Merelli, L. Milanesi, A. Clematis, D. D'Agostino","doi":"10.1109/PDP.2013.68","DOIUrl":"https://doi.org/10.1109/PDP.2013.68","url":null,"abstract":"In the last decade, different computing paradigms and modelling frameworks for the description and simulation of biochemical systems based on stochastic modelling have been proposed. From a computational point of view, many simulations of the model are necessary to identify the behaviour of the system. The execution of thousands of simulation can require huge amount of time, therefore the parallelization of these algorithms is highly desirable. In this work we discuss the different strategies that can be implemented for the parallelization of a space aware τ-DPP variant, that is proving a C-MPI implementation of the system and discussing its performances according to the simulation of a particle diffusion in a crowded environment.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125715202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"QoS Manager for Energy Efficient Many-Core Operating Systems","authors":"Simon Holmbacka, D. Agren, S. Lafond, J. Lilius","doi":"10.1109/PDP.2013.53","DOIUrl":"https://doi.org/10.1109/PDP.2013.53","url":null,"abstract":"The oncoming many-core platforms is a hot topic these days, and this next generation hardware sets new focus on energy and thermal awareness. With a more and more dense packing of transistors, the system must be made energy aware to not suffer from overheating and energy waste. As a step towards increased energy efficiency, we intend to add the notion of QoS handling to the OS level and to applications. We suggest the design of a QoS manager as a plug-in OS extension capable of providing applications with the necessary resources leading to better energy efficiency.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131660327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Access to the DARIAH Bit Preservation Service for Humanities Research Data","authors":"D. Tonne, J. Rybicki, Stefan E. Funk, Peter Gietz","doi":"10.1109/PDP.2013.12","DOIUrl":"https://doi.org/10.1109/PDP.2013.12","url":null,"abstract":"Sustainable management of large amounts of research data is gaining in importance for research projects all over the world. The European project DARIAH aims to address this topic for the arts and humanities community. The DARIAH Bit Preservation, as a part of an archiving system for the arts and humanities, allows for a high performance, sustainable, and distributed storage of research data as basis of virtual research environments. A great challenge in designing such a service is to provide a standardized, consistent yet easy-to-use API for accessing the data that remains stable even if backend technology changes over time. As a solution, this paper presents the RESTful API of the DARIAH Bit Preservation which includes an administrative extension, and which is secured by an Authentication and Authorization Infrastructure (AAI) based on SAML. An exemplary implementation illustrates that the API offers distributed access by usage of the HTTP protocol and is able to handle a high number of files. Data transfer rates of up to 45 MB/s were achieved for uploading large files in the local network.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"251 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133316345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying Critical Code Sections in Dataflow Programming Models","authors":"V. Subotic, J. Sancho, J. Labarta, M. Valero","doi":"10.1109/PDP.2013.15","DOIUrl":"https://doi.org/10.1109/PDP.2013.15","url":null,"abstract":"The years of practice in optimizing applications point that the major issue is focus - identifying the critical code section whose optimization would yield the highest overall speedup. While this issue is mainly solved for sequential applications, it remains a serious hurdle in the world of parallel computing. Furthermore, the newest dataflow parallel programming models expose very irregular parallelism, making the identification of the critical code section even harder. To address this issue, we designed an environment that identifies critical code sections in applications. The programmer can use this environment to estimate the potential benefits of the optimization for a specific parallel platform. This is very important because the programmer can anticipate the benefits of his optimization and assure that the optimization is worth the effort. Furthermore, we showed that in many applications, the choice of the critical code section decisively depends on the configuration of the target machine. For instance, in HP Linpack, optimizing a task that takes 0.49% of the total computation time yields the overall speedup of less than 0.25% on one machine, and at the same time, yields the overall speedup of more than 24% on a machine with different number of cores.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123012289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kazumi Yoshinaga, Y. Tsujita, A. Hori, Mikiko Sato, M. Namiki, Y. Ishikawa
{"title":"A Delegation Mechanism on Many-Core Oriented Hybrid Parallel Computers for Scalability of Communicators and Communications in MPI","authors":"Kazumi Yoshinaga, Y. Tsujita, A. Hori, Mikiko Sato, M. Namiki, Y. Ishikawa","doi":"10.1109/PDP.2013.43","DOIUrl":"https://doi.org/10.1109/PDP.2013.43","url":null,"abstract":"This paper describes a delegation based high throughput MPIcommunication mechanism under tough memory utilization constrains on a many-core oriented hybrid parallel computer. Towards the Exascale era, hybrid parallel computers consisting of many-core and multi-core architectures both on the same node are focused. Although many-core architectures such as GPU or Intel MIC has high potential in computing power by the large number of computing cores, per-core computing power is lower than that of multi-core CPUs. Furthermore, available memory resources for the many-core CPUs are quite smaller than those for multi-core CPUs. Thus we may have a sort of penalty in memory utilization in MPI communications when we utilize a normal MPI library. Here we deploy a delegatee process on each node to merge MPI communications and minimize memory utilization for an MPI communicator. Another advantage of the delegatee process scheme is minimization of memory utilization on many-core CPUs by delegating MPI requests to associated delegatee process on multi-core CPUs. In this paper, we show performance advantages and effective resource utilization by our proposed scheme compared with the original MPI implementation.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126356588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniele Buono, M. Danelutto, Silvia Lametti, M. Torquati
{"title":"Parallel Patterns for General Purpose Many-Core","authors":"Daniele Buono, M. Danelutto, Silvia Lametti, M. Torquati","doi":"10.1109/PDP.2013.27","DOIUrl":"https://doi.org/10.1109/PDP.2013.27","url":null,"abstract":"Efficient programming of general purpose many-core accelerators poses several challenging problems. The high number of cores available, the peculiarity of the interconnection network, and the complex memory hierarchy organization, all contribute to make efficient programming of such devices difficult. We propose to use parallel design patterns, implemented using algorithmic skeletons, to abstract and hide most of the difficulties related to the efficient programming of many-core accelerators. In particular, we discuss the porting of the FastFlow framework on the Tilera TilePro64 architecture and the results obtained running synthetic benchmarks as well as true application kernels. These results demonstrate the efficiency achieved while using patterns on the TilePro64 both to program stand-alone skeleton-based parallel applications and to accelerate existing sequential code.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124958167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investment Strategies for Credit-Based P2P Communities","authors":"M. Capotă, N. Andrade, J. Pouwelse, D. Epema","doi":"10.1109/PDP.2013.70","DOIUrl":"https://doi.org/10.1109/PDP.2013.70","url":null,"abstract":"P2P communities that use credits to incentivize their members to contribute have emerged over the last few years. In particular, private BitTorrent communities keep track of the total upload and download of each member and impose a minimum threshold for their upload/download ratio, which is known as their sharing ratio. It has been shown that these private communities have significantly better download performance than public communities. However, this performance is based on oversupply, and it has also been shown that it is hard for users to maintain a good sharing ratio to avoid being expelled from the community. In this paper, we address this problem by introducing a speculative download mechanism to automatically manage user contribution in BitTorrent private communities. This mechanism, when integrated in a BitTorrent client, identifies the swarms that have the biggest upload potential, and automatically downloads and seeds them. In other words, it tries to invests the bandwidth of the user in a profitable way. In order to accurately asses the upload potential of swarms we analyze a private BitTorrent community and derive through multiple regression a predictor for the upload potential based on simple parameters accessible to each peer. The speculative download mechanism uses the predictor to build a cache of profitable swarms to which the peer can contribute. Our results show that 75 % of investment decisions result in an increase in upload bandwidth utilization, with a median 207 % return on investment.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124619134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Reliability-Aware Multi-application Mapping Technique in Networks-on-Chip","authors":"F. Khalili, H. Zarandi","doi":"10.1109/PDP.2013.77","DOIUrl":"https://doi.org/10.1109/PDP.2013.77","url":null,"abstract":"This paper proposes a reliability-aware mapping technique for multi applications in networks-on-chip. The proposed technique consists of three main steps: 1) Generating a new core graph enriched by spares, based on a given application core graph, 2) Finding smallest rectangular region to place the given application using a heuristic algorithm, and 3) Searching the specified region into whole NoC, and selecting a region which results minimum overall performance and communication energy. Spare cores are connected to all vertices of application core graph and their edges are weighted by failure probability of processing cores assigned to the application and will be updated during mapping process. Many application core graphs are used to evaluate the proposed technique. The results of 100,000 fault injection experiments show communication energy reduction and performance improvement compared to well-known related techniques in both faulty and fault-free modes.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129716593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Druml, M. Menghin, C. Steger, R. Weiss, Andreas Genser, H. Bock, J. Haid
{"title":"Emulation-Based Test and Verification of a Design's Functional, Performance, Power, and Supply Voltage Behavior","authors":"N. Druml, M. Menghin, C. Steger, R. Weiss, Andreas Genser, H. Bock, J. Haid","doi":"10.1109/PDP.2013.54","DOIUrl":"https://doi.org/10.1109/PDP.2013.54","url":null,"abstract":"Test and verification are essential parts during a product's development cycle. Simulation and emulation are well known techniques to test and verify the functionality of a design-under-test (DUT) before its tape-out. However, there are additional issues like peak power consumption and supply voltage drops, which can compromise a hardware's functionality. These issues are only partly covered by nowadays functional hardware emulation test and verification approaches. This paper presents a comprehensive emulation methodology. It combines functional hardware emulation with model-based performance, power, and supply voltage analysis techniques. The DUT, which has to be available in a hardware description language, is integrated into a FPGA along with designated analysis units. These analysis units implement models of the DUT's performance, power consumption, and supply voltage behavior. The presented emulation methodology allows a designer to test designs in such a way that the cycle accurate results are taken online, in real-time, and verify both functional and performance behavior, as well as power consumption and supply voltage levels. The proposed comprehensive emulation methodology is used, as an example of application, to verify the design of a LEON3 multi-core processor system as well as a RF-powered contacatless smart card. The depicted results demonstrate that this emulation approach is suitable to detect functional misbehavior caused by power and supply voltage hazards and how they influence the performance of the system.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130868978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Aldinucci, M. Drocco, Fabio Tordini, M. Coppo, M. Torquati
{"title":"Parallel Stochastic Simulators in System Biology: The Evolution of the Species","authors":"Marco Aldinucci, M. Drocco, Fabio Tordini, M. Coppo, M. Torquati","doi":"10.1109/PDP.2013.66","DOIUrl":"https://doi.org/10.1109/PDP.2013.66","url":null,"abstract":"The stochastic simulation of biological systems is an increasingly popular technique in Bioinformatics. It is often an enlightening technique, especially for multi-stable systems which dynamics can be hardly captured with ordinary differential equations. To be effective, stochastic simulations should be supported by powerful statistical analysis tools. The simulation-analysis workflow may however result in being computationally expensive, thus compromising the interactivity required in model tuning. In this work we advocate the high-level design of simulators for stochastic systems as a vehicle for building efficient and portable parallel simulators. In particular, the Calculus of Wrapped Components (CWC) simulator, which is designed according to the FastFlow's pattern-based approach, is presented and discussed in this work. FastFlow has been extended to support also clusters of multi-cores with minimal coding effort, assessing the portability of the approach.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133498377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}