{"title":"Modeling Network-Level Impacts of P2P Flows","authors":"Márk Jelasity, Vilmos Bilicki, Miklos Kasza","doi":"10.1109/PDP.2011.9","DOIUrl":"https://doi.org/10.1109/PDP.2011.9","url":null,"abstract":"It has been clear for a long time that P2P applications represent a large proportion of the load on the network infrastructure. This is why significant research efforts have been devoted to reducing this load, in the form of ISP friendly P2P solutions. These solutions focus on the volume of the traffic as opposed to the number of network flows. At the same time, we are witnessing a great demand for more and more intelligence in the network such as flow based monitoring and application recognition, which have an overhead that depends on the number of flows and not on the volume of the traffic. Besides, the implementation of this intelligence is moving from the access layer towards the distribution and core layers. We show through measurements that the typical devices serving in the different layers of the infrastructure are not sufficiently scalable in terms of the number of flows, and, most importantly, the combined effect of an increase in the access layer bandwidth together with an increase in the P2P (e.g., BitTorrent) population will practically disable the intelligent networking capabilities. Our conclusion is that a novel focus needs to be incorporated into P2P research that concentrates on reducing the number of network flows generated by P2P applications.","PeriodicalId":341803,"journal":{"name":"2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131292534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transport Optimization in Peer-to-Peer Networks","authors":"K. Miller, A. Wolisz","doi":"10.1109/PDP.2011.26","DOIUrl":"https://doi.org/10.1109/PDP.2011.26","url":null,"abstract":"The peer-to-peer networking concept has revolutionized the cost structure of Internet data dissemination by making large scale content delivery with low server cost feasible. In a peer-to-peer network, the total upload capacity increases with the number of down loaders instead of staying constant as in a client-server architecture, making it highly scalable. Despite of its importance, the problem of efficient data transport in a peer-to-peer network is still an open issue, mainly due to its complex combinatorial structure. In the presented work, we formulate the problem of optimizing a peer-to-peer download with respect to its make span (time until all peers finish downloading)as a mixed integer linear program. Other than previous studies, we consider the case of arbitrary heterogeneous uplink and downlink capacities of the peers. Moreover, we do not consider the fluid limit case but allow the file to be subdivided in finitely many chunks. On the one hand, our results allow to infer the capacity of a peer-to-peer network, providing a benchmark for performance analysis of existing peer-to-peer protocols. On the other hand, we believe that our results build a step towards the development of efficient algorithms serving as a base for the design of data transport protocols leveraging the peer-to-peer concept.","PeriodicalId":341803,"journal":{"name":"2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing","volume":"337 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115670753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Task Migration in Mesh NoCs over Virtual Point-to-Point Connections","authors":"B. Goodarzi, H. Sarbazi-Azad","doi":"10.1109/PDP.2011.71","DOIUrl":"https://doi.org/10.1109/PDP.2011.71","url":null,"abstract":"Processor allocation in todays many core MPSoCs is a challenging task, especially since the order and requirements of incoming applications are unknown during design stage. To improve network performance, balance the workload across processing cores, or mitigate the effect of hot processing elements in thermal management methodologies, task migration is a method which has attracted much attention in recent years. Runtime task migration was first proposed in multicomputer with load balancing as the major objective. However, specific NoC properties such as limited amount of communication buffers, more sensitivity to implementation complexity, and tight latency and power consumption constraints bring new challenges in using task migration mechanisms in NoCs. As a consequence, the efficiency and applicability of traditional migration mechanisms (developed for multicomputers) are under question. Due to the limited resource budget in NoC-based MPSoCs as well as tight performance constraints of running applications, in this paper, we propose an efficient methodology based on virtual point-to-point (VIP for short) connections. These dedicated VIP connections provide low-latency and low-power paths for heavy communication flows created by task migration mechanisms. Analyzing the results show that the proposed scheme reduces message latency by 13% and migration latency by 14%, while 10% power savings can be achieved compared to the previously proposed task migration strategy (known as Gathering-Rout-Scattering) for mesh multiprocessors.","PeriodicalId":341803,"journal":{"name":"2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing","volume":"20 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120874022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Deadline Satisfaction Enhanced Workflow Scheduling Algorithm","authors":"Xi Li, Zhi-gang Hu, Chaokun Yan","doi":"10.1109/PDP.2011.29","DOIUrl":"https://doi.org/10.1109/PDP.2011.29","url":null,"abstract":"Meeting users' deadline constraint is usually the most important goal of workflow scheduling in Grid environment. In order to consider the dynamism of Grid resource, we adopted a stochastic model to describe dynamic workloads of Grid resources. A concept called Deadline Satisfaction Degree of Workflow (DSDW) was defined to represent the probability that a workflow could be completed before its deadline. We calculated task execution priorities based on their precedence relations in the workflow, then determined the candidate resource for each task so as to maximize DSDW, finally converted distribution problem of overall workflow deadline into a nonlinear programming problem with constraints and resolved it with known solutions. A Deadline Satisfaction Enhanced Scheduling Algorithm for Workflow (DSESAW) involving deadline distribution and resource selection was presented. The extensive simulation experiments using a practical medical image analysis application was conducted to verify our algorithm. Experimental results indicated that our algorithm could adapt to dynamic Grid environment and provide a good guarantee for user's deadline requirements.","PeriodicalId":341803,"journal":{"name":"2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124321820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jose Ignacio Agulleiro Baldo, E. M. Garzón, I. García, José-Jesús Fernández
{"title":"Multi-core Desktop Processors Make Possible Real-Time Electron Tomography","authors":"Jose Ignacio Agulleiro Baldo, E. M. Garzón, I. García, José-Jesús Fernández","doi":"10.1109/PDP.2011.36","DOIUrl":"https://doi.org/10.1109/PDP.2011.36","url":null,"abstract":"Electron tomography (ET) allows elucidation of the three-dimensional (3D) structure of large complex biological specimens at molecular resolution. In order to achieve such resolution levels, large projection images have to be used to compute the 3D reconstructions. Tomographic reconstruction on this scale requires a tremendous use of computational resources and considerable processing time. Traditionally, parallel and distributed systems, and more recently GPUs, have been the key to cope with this demanding procedure. This work demonstrates that full exploitation of the impressive processing power within modern multi-core processors make them a feasible alternative. The use of parallel computing, vectorization and code optimization allows ultra-fast tomographic reconstructions on standard computers, even outperforming GPUs. Our results confirm that modern processors succeed in providing reconstructed volumes in very little time, which enables them for real-time ET.","PeriodicalId":341803,"journal":{"name":"2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125922295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Javaspace-Based Framework for Efficient Fault-Tolerant Master-Worker Distributed Applications","authors":"V. Galtier, C. Makassikis, S. Vialle","doi":"10.1109/PDP.2011.82","DOIUrl":"https://doi.org/10.1109/PDP.2011.82","url":null,"abstract":"We propose a framework built around a Java Space to ease the development of bag-of-tasks applications. The framework may optionally and automatically tolerate transient crash failures occurring on any of the distributed elements. It relies on check pointing and underlying middleware mechanisms to do so. To further improve check pointing efficiency, both in size and frequency, the programmer can introduce intermediate user-defined checkpoint data and code within the task processing program. The framework used without fault tolerance accelerates application development, does not introduce runtime overhead and yields to expected speedup. When enabling fault tolerance, our framework allows, despite failures, correct completion of applications with limited runtime and data storage overheads. Experiments run with up to 128 workers study the impact of some user-related and implementation-related on overall performance, and reveal good performances for classical Java Space-based master-worker application profiles.","PeriodicalId":341803,"journal":{"name":"2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116769358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco Aldinucci, M. Coppo, Ferruccio Damiani, M. Drocco, M. Torquati, Angelo Troina
{"title":"On Designing Multicore-Aware Simulators for Biological Systems","authors":"Marco Aldinucci, M. Coppo, Ferruccio Damiani, M. Drocco, M. Torquati, Angelo Troina","doi":"10.1109/PDP.2011.81","DOIUrl":"https://doi.org/10.1109/PDP.2011.81","url":null,"abstract":"The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the Fast Flow programming framework making possible fast development and efficient execution on multi-cores.","PeriodicalId":341803,"journal":{"name":"2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116916771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating Parameter Sweep Applications Using CUDA","authors":"M. Motokubota, Fumihiko Ino, K. Hagihara","doi":"10.1109/PDP.2011.19","DOIUrl":"https://doi.org/10.1109/PDP.2011.19","url":null,"abstract":"This paper proposes a parallelization scheme for parameter sweep (PS) applications using the compute unified device architecture (CUDA). Our scheme focuses on PS applications with irregular access patterns, which usually result in lower performance on the GPU. The key idea to resolve this irregularity is to exploit the similarity of data accesses between different parameters. That is, the scheme simultaneously processes multiple parameters instead of a single parameter. This simultaneous sweep allows data accesses to be coalesced into a single access if the irregularity appears similarly at every parameter. It also reduces the amount of off-chip memory access by using fast on-chip memory for the data commonly accessed for multiple parameters. As a result, the scheme achieves up to 4.5 times higher performance than a naive scheme that processes a single parameter by a kernel invocation.","PeriodicalId":341803,"journal":{"name":"2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128590921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}