{"title":"An Efficient Randomized Routing Protocol for Single-Hop Radio Networks","authors":"S. Rajasekaran, Dolly Sharma, R. Ammar, N. Lownes","doi":"10.1109/ICPP.2010.25","DOIUrl":"https://doi.org/10.1109/ICPP.2010.25","url":null,"abstract":"In this paper we study the important problems of message routing, sorting, and selection in a radio network. A radio network consists of stations where each station is a hand-held device. We consider a single-hop radio network. In a single-hop network it is assumed that each station is within the transmission range of every other station. Let RN(p; k) stand for a single-hop network that has p stations and k communication channels. The problems of sorting and selection have been studied on RN(p; k). For these problems it is assumed that there are n/p elements to start with at each station. At the end of sorting, the least n/p elements should be in the first station, the next smallest n/p elements should be in the second station, and so on. The best known prior algorithm for sorting takes 4n/k +o(n/k) broadcast rounds on a RN(p; k). In this paper we present a randomized algorithm that takes only 3n/k +o(n/k) broadcast rounds with high probability. For the selection problem, it is known that the maximum or minimum element can be found in O(log n) rounds on a RN(n; 1), provided broadcast conflicts can be resolved in O(1) time. The problem of general selection has not been addressed. In this paper we present a randomized selection algorithm that takes O(p/k) rounds on a RN(p; k) with high probability. An important message routing problem that is considered in the literature is one where there are n/p packets originating from each station and there are n/p packets destined for each station. The best known routing algorithms take nearly 2n/k times slots. An important open question has been if there exist algorithms that take only close to n/k time slots. Note that a trivial lower bound for routing is n/k. The existence of such algorithms will be highly relevant especially in emergencies and time critical situations. In this paper we answer this question by presenting a randomized algorithm that takes nearly n/k time slots with high probability.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132400794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modelization and Performance Evaluation of the DIET Middleware","authors":"E. Caron, B. Depardon, F. Desprez","doi":"10.1109/ICPP.2010.45","DOIUrl":"https://doi.org/10.1109/ICPP.2010.45","url":null,"abstract":"It is nowadays common to use a grid middleware to access distributed resources in order to solve large problems. Many middleware can be found in the literature. Whereas they all rely on the use of resource brokers (also sometimes called agents) to schedule jobs, and servers to execute them, they do not share the same structure. Many rely on a simple design, e.g., a star graph (one agent managing several servers); and the most scalable ones rely on a hierarchy of agents. To the best of our knowledge, very few studies have been conducted on the modelization and performance prediction of such middleware. In this paper, we present a model for hierarchical middleware, i.e., based on a tree structure. Then, we compare our model predictions with real executions on the DIET middleware, and provide a few hints on how to attain the best performance.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134063280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Yu, Liming Lu, Zhoujun Li, Xiaofeng Wang, Jinshu Su
{"title":"A Simple Effective Scheme to Enhance the Capability of Web Servers Using P2P Networks","authors":"Jie Yu, Liming Lu, Zhoujun Li, Xiaofeng Wang, Jinshu Su","doi":"10.1109/ICPP.2010.76","DOIUrl":"https://doi.org/10.1109/ICPP.2010.76","url":null,"abstract":"Nowadays, web servers are suffering from flash crowds and application layer DDoS attacks that can severely degrade the availability of services. It is difficult to prevent them because they comply with the communication protocol. Peer-to-peer (P2P) networks have been exploited to amplify DDoS attacks, but we believe their available resource, such as distributed storage and network bandwidth, can be used to mitigate both flash crowds and DDoS attacks. In this paper, we propose a server initiated approach to employ deployed P2P networks as distributed web caches, so that the workload directed to web servers can be reduced. In experiments, we use Kad as the particular P2P network for the realization of a large-scale distributed web cache. We performed comprehensive evaluation on the feasibility, efficiency and robustness of our scheme, through experiments and simulations on the prototype we implemented. The evaluation results show that our scheme can increase the capacity of the protected web servers at least 10 times at the same cost of connection and bandwidth consumption. The web contents cached in Kad remain reachable even under churn of peers and targeted DoS attack, and the access latency is comparable to normal direct access to web servers. It also achieves good load balancing under the heavy-tailed distribution of object popularity.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122682365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying the Root Causes of Wait States in Large-Scale Parallel Applications","authors":"David Böhme, M. Geimer, F. Wolf, L. Arnold","doi":"10.1145/2934661","DOIUrl":"https://doi.org/10.1145/2934661","url":null,"abstract":"Driven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. By replaying event traces in parallel both in forward and backward direction, we can identify the processes and call paths responsible for the most severe imbalances even for runs with tens of thousands of processes.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132489187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FlashCoop: A Locality-Aware Cooperative Buffer Management for SSD-Based Storage Cluster","authors":"Q. Wei, Bozhao Gong, Suraj Pathak, Y. Tay","doi":"10.1109/ICPP.2010.71","DOIUrl":"https://doi.org/10.1109/ICPP.2010.71","url":null,"abstract":"Random writes significantly limit the application of flash-based Solid State Drive (SSD) in enterprise environment due to its poor latency, negative impact on SSD lifetime and high garbage collection overhead. To release above limitations, we propose a locality-aware cooperative buffer scheme referred to as FlashCoop (Flash Cooperation), which leverages free memory of neighboring storage server to buffer writes over high speed network. Both temporal and sequential localities of access pattern are exploited in the design of cooperative buffer management. Leveraging the filtering effect of the cooperative buffer, FlashCoop can efficiently shape the I/O request stream and improve the sequentiality of the write accesses passed to the SSD. FlashCoop has been extensively evaluated under various enterprise workloads. Our benchmark results conclusively demonstrate that FlashCoop can achieve 52.3% performance improvement and 56.5% garbage collection overhead reduction compared to the system without FlashCoop.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125830574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lock-Free Multiway Search Trees","authors":"Michael Spiegel, P. Reynolds","doi":"10.1109/ICPP.2010.68","DOIUrl":"https://doi.org/10.1109/ICPP.2010.68","url":null,"abstract":"We propose a lock-free multiway search tree algorithm for concurrent applications with large working set sizes. Our algorithm is a variation of the randomized skip tree. We relax the ordering constraints among the nodes in the original skip tree definition. Optimal paths through the tree are temporarily violated by mutation operations, and eventually restored using online node compaction. Experimental evidence shows that our lock-free skip tree outperforms a highly tuned concurrent skip list under workloads of various proportions of operations and working set sizes. The max throughput of our algorithm is on average 41% higher than the throughput of the skip list, and 129% higher on the workload of the largest working set size and read-dominated operations.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125988770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Revisting Tag Collision Problem in RFID Systems","authors":"Lei Yang, Jinsong Han, Yong Qi, Cheng Wang, Yunhao Liu, Ying Chen, Xiao Zhong","doi":"10.1109/ICPP.2010.27","DOIUrl":"https://doi.org/10.1109/ICPP.2010.27","url":null,"abstract":"In RFID systems, the reader is unable to discriminate concurrently reported IDs of tags from the overlapped signals, and a collision happens. Many algorithms for anticollision are proposed to improve the throughput and reduce the latency for tag identification. Existing anti-collision algorithms mainly employ CRC based collision detection functions for determining whether the collision happens. Generating CRC codes, however, requires complicated computations for both RF tags and readers, and hence incurs non-trivial time consumption, becoming the bottleneck. In this study, we design a Quick Collision Detection (QCD) scheme based on the bitwise complement function plus collision preamble, which significantly reduces the number of gates for computation and facilitates to simplify the IC design of RFID tags. The QCD scheme does not require any modification on upperlevel air protocols, so it can be seamlessly adopted by current anti-collision algorithms. Through comprehensive analysis and simulations, we show that QCD improves the identification efficiency by 40%.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132399813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}