{"title":"BSP2OMP: A compiler for translating BSP programs to OpenMP","authors":"A. Marowka","doi":"10.1080/17445760902719927","DOIUrl":"https://doi.org/10.1080/17445760902719927","url":null,"abstract":"The convergence of the two widely used parallel programming paradigms, shared- memory and distributed- shared-memory parallel programming models, into a unified parallel programming model is crucial for parallel computing to become the next mainstream programming paradigm. We study the design differences and the performance issues of two parallel programming models: a shared- memory programming model (OpenMP) and a distributed- shared programming model (BSP). The study was carried out by designing a compiler for translating BSP parallel programs to an OpenMP programming model called BSP20MP. Analysis of the compiler outcome, and of the performance of the compiled programs, show that the two models are based on very similar underlying principles and mechanisms.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126989891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Overfort: Combating DDoS with peer-to-peer DDoS puzzle","authors":"Soon Hin Khor, A. Nakao","doi":"10.1109/IPDPS.2008.4536561","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536561","url":null,"abstract":"The Internet community has been long convinced that distributed denial-of-service (DDoS) attacks are difficult to combat since IP spoofing prevents traceback to the sources of attacks. Even if traceback is possible, the sheer number of sources that must be shutdown renders trace-back, by itself, ineffective. Due to this belief, much effort has been focused on winning the \"arms race\" against DDoS by over-provisioning resources. This paper shows how Overfort can possibly withstand DDoS onslaughts without being drawn into an arms race by using higher-level traceback to DDoS agents' local DNSes (LDNSes) and dealing with those LDNSes instead. Overfort constructs an on-demand overlay using multiple overlay-ingress gateways with their links partitioned into many virtual links - each with different bandwidth and IP - leading to the server to project the illusion of multiple server IPs. An attacker will be faced with the daunting puzzle of finding all the IPs and thereafter the confusion of how much traffic to clog each IP with. Furthermore, Overfort has a mechanism to segregate LDNSes that are serving DDoS agents and restrict them to a limited number of IPs thus saving the other available IPs for productive use. Both proliferation of access channels to the server and LDNS segregation mechanism are the key components in Overfort to defend against DDoS with significantly less resources.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127002762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eunjung Cho, A. Bourgeois, José Alberto Fernández-Zepeda
{"title":"Efficient and accurate FPGA-based simulator for Molecular Dynamics","authors":"Eunjung Cho, A. Bourgeois, José Alberto Fernández-Zepeda","doi":"10.1109/IPDPS.2008.4536517","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536517","url":null,"abstract":"A molecular dynamics (MD) system is defined by the position and momentum of particles and their interactions. Solving the dynamics numerically and evaluating the interaction is computationally expensive even for a small number of particles in the system. We are focusing on long-ranged interactions, since the calculation time is O(N2) for an N particle system. There are many existing algorithms aimed at reducing the calculation time of MD simulations. Among the existing algorithms, multigrid (MG) method [1] reduces O(N2) calculation time to O(N) time while still achieving reasonable accuracy. Another movement to achieve much faster calculation time is running MD simulation on special purpose processors and customized hardware with ASICs or an FPGAs. In this paper, we design and implement FPGA-based MD simulator with an efficient MG method.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127656789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimization via Reflection on Work Stealing in TBB","authors":"A. Robison, Michael J. Voss, Alexey Kukanov","doi":"10.1109/IPDPS.2008.4536188","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536188","url":null,"abstract":"Intelreg Threading Building Blocks (Intelreg TBB) is a C++ library for parallel programming. Its templates for generic parallel loops are built upon nested parallelism and a work-stealing scheduler. This paper discusses optimizations where the high-level algorithm inspects or biases stealing. Two optimizations are discussed in detail. The first dynamically optimizes grain size based on observed stealing. The second improves prior work that exploits cache locality by biased stealing. This paper shows that in a task stealing environment, deferring task spawning can improve performance in some contexts. Performance results for simple kernels are presented.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"os-28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127773731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Raghavan, M. Kandemir, M. J. Irwin, K. Malkowski
{"title":"Managing power, performance and reliability trade-offs","authors":"P. Raghavan, M. Kandemir, M. J. Irwin, K. Malkowski","doi":"10.1109/IPDPS.2008.4536422","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536422","url":null,"abstract":"We present recent research on utilizing power, performance and reliability trade-offs in meeting the demands of scientific applications. In particular we summarize results of our recent publications on (i) phase-aware adaptive hardware selection for power-efficient scientific computations, (ii) adapting application execution to reduced CPU availability, and (Hi) a helper thread based EDP reduction scheme for adapting application execution in CMPs.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128134132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Xiong, Yingshu Li, J. Park, L. Yang, Yan Yang, Sun Tao
{"title":"Fast and efficient formation flocking for a group of autonomous mobile robots","authors":"N. Xiong, Yingshu Li, J. Park, L. Yang, Yan Yang, Sun Tao","doi":"10.1109/IPDPS.2008.4536482","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536482","url":null,"abstract":"The control and coordination of mobile robots in groups that can freely cooperate and move on a plane is a widely studied topic in distributed robotics. In this paper, we focus on the flocking problem: there are two kinds of robots: the leader robot and the follower robots. The follower robots are required to follow the leader robot wherever it goes (following), while keeping a formation they are given in input (flocking). A novel scheme is proposed based on the relative motion theory. Extensive theoretical analysis and simulation results demonstrate that this scheme provides the follower robots an efficient method to follow the leader as soon as possible with the shortest path. Furthermore, this scheme is scalable, and the processing load for every robot is not increased with the addition of more robots.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128134990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SIFT implementation and optimization for multi-core systems","authors":"Qi Zhang, Yurong Chen, Yimin Zhang, Yinlong Xu","doi":"10.1109/IPDPS.2008.4536131","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536131","url":null,"abstract":"Scale invariant feature transform (SIFT) is an approach for extracting distinctive invariant features from images, and it has been successfully applied to many computer vision problems (e.g. face recognition and object detection). However, the SIFT feature extraction is compute-intensive, and a real-time or even super-real-time processing capability is required in many emerging scenarios. Nowadays, with the multi- core processor becoming mainstream, SIFT can be accelerated by fully utilizing the computing power of available multi-core processors. In this paper, we propose two parallel SIFT algorithms and present some optimization techniques to improve the implementation 's performance on multi-core systems. The result shows our improved parallel SIFT implementation can process general video images in super-real-time on a dual-socket, quad-core system, and the speed is much faster than the implementation on GPUs. We also conduct a detailed scalability and memory performance analysison the 8-core system and on a 32-core chip multiprocessor (CMP) simulator. The analysis helps us identify possible causes of bottlenecks, and we suggest avenues for scalability improvement to make this application more powerful on future large-scale multi- core systems.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133552084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Service replication in Grids: Ensuring consistency in a dynamic, failure-prone environment","authors":"André Luckow, Bettina Schnor","doi":"10.1109/IPDPS.2008.4536211","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536211","url":null,"abstract":"A major challenge in a service-oriented environment as a Grid is fault tolerance. The more resources and services involved, the more complicated and error-prone becomes the system. Migol (Luckow and Schnor, 2008) is a Grid middleware, which addresses the fault tolerance of Grid applications and services. Migol's core component is its registry service called application information service (AIS). To achieve fault tolerance and high availability the AIS is replicated on different sites. Since a registry is a stateful Web service, the replication of the AIS is no trivial task. In this paper, we present our concept for active replication of Grid services. Migol's Replication Service uses a token-based algorithm and certificate-based security to provide secure group communication. Further, we show in different experiments that active replication in a real Grid environment is feasible.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133069245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy efficient sleep scheduling based on moving directions in target tracking sensor network","authors":"Bo Jiang, K. Han, B. Ravindran, Hyeonjoong Cho","doi":"10.1109/IPDPS.2008.4536330","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536330","url":null,"abstract":"This paper presents a target direction-based sleep scheduling algorithm (TDSS) for target tracking surveillance sensor networks. TDSS reduces the number of the proactively awakened sensor nodes and schedules their sleep pattern to enhance energy efficiency but suffer little performance loss. Both approaches are based on two probabilistic distribution models of target moving directions, normal distribution and linear distribution. We compare TDSS with the two models against the legacy circle-based proactively waking up scheme (Circle) and a working node reducing algorithm - MCTA. The evaluation result shows that TDSS achieves better energy efficiency but with less performance loss in terms of detection probability and detection delay.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133395636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PLP: Towards a realistic and accurate model for communication performances on hierarchical cluster-based systems","authors":"W. Nasri, Olfa Tarhouni, Nadia Slimi","doi":"10.1109/IPDPS.2008.4536486","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536486","url":null,"abstract":"Today, due to many reasons, such as the inherent heterogeneity, the diversity, and the continuous evolving of actual computational supports, writing efficient parallel applications on such systems represents a great challenge. One way to answer this problem is to optimize communications of such applications. Our objective within this work is to design a realistic model able to accurately predict the cost of communication operations on execution environments characterized by both heterogeneity and hierarchical structure. We principally aim to guarantee a good quality of prediction with a neglected additional overhead. The proposed model was applied on point-to-point and collective communication operations and showed by achieving experiments on a hierarchical cluster-based system with heterogeneous resources that the predicted performances are close to measured ones.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123311102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}