{"title":"Migration decision for hybrid mobility in reconfigurable distributed virtual machines","authors":"S. Fu, Chengzhong Xu","doi":"10.1109/ICPP.2004.1327940","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327940","url":null,"abstract":"Virtual machine (VM) is an important mechanism to multiplex computer resources. The increasing popularity of network computing has renewed research interests in the adaptive and distributed virtual machines. Service migration is a vital technique to construct reconfigurable VMs. By incorporating mobile agent technology, VM systems can improve their resource utilization, load-balancing and fault-tolerance significantly. This work focuses on the decision problem of hybrid mobility for load-balancing in reconfigurable distributed VMs. We tackle this problem from three aspects: migration candidate determination, migration timing and destination server selection. The service migration timing and destination server selection are formulated as two optimization models. We derive the optimal migration policy for distributed and heterogeneous systems based on stochastic optimization theories. Renewal processes are applied to model the dynamics of migration. We solve the agent migration problem by dynamic programming and extend the optimal service migration decision by considering the interplay of the hybrid mobility. Our decision policy is complementary to the existing service and agent migration techniques. Its accuracy is verified by simulations.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121725334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel routing and wavelength assignment for optical multistage interconnection networks","authors":"E. Lu, Si-Qing Zheng","doi":"10.1109/ICPP.2004.1327924","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327924","url":null,"abstract":"Multistage interconnection networks (MINs) are among the most efficient switching architectures in terms of the number of switching elements (SEs) used. For optical MINs (OMINs), two I/O connections with neighboring wavelengths cannot share a common SE due to crosstalk. In this paper, we focus on the wavelength dilation approach, in which the I/O connections sharing a common SE will be assigned different wavelengths with enough wavelength spacing. We first study the permutation capacity of OMINs, then propose fast parallel routing and wavelength assignment algorithms for OMINs. By applying our permutation decomposition and graph coloring techniques, the proposed algorithms can route any permutation without crosstalk in wavelength-rearrangeable space-strict-sense Banyan networks and wavelength-rearrangeable space-rearrangeable Benes networks in polylogarithmic time using a linear number of processors.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127849554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm design and synthesis for wireless sensor networks","authors":"A. Bakshi, V. Prasanna","doi":"10.1109/ICPP.2004.1327951","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327951","url":null,"abstract":"Most of current research in wireless networked embedded sensing approaches the problem of application design as one of manually customizing network protocols. The design complexity and required expertise make this unsuitable for increasingly complex sensor network systems. We address this problem from a parallel and distributed systems perspective and propose a methodology that enables domain experts to design, analyze, and synthesize sensor network applications without requiring a knowledge of implementation details. At the core of our methodology is a virtual architecture for a class of sensor networks that hides enough system details to relieve programmers of the burden of managing low-level control and coordination, and provides algorithm designers with a clean topology and cost model. We illustrate this methodology using a real-world topographic querying application as a case study.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134644881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Job fairness in non-preemptive job scheduling","authors":"Gerald Sabin, G. Kochhar, P. Sadayappan","doi":"10.1109/ICPP.2004.1327920","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327920","url":null,"abstract":"Job scheduling has been a much studied topic over the years. While past research has studied the effect of various scheduling policies using metrics such as turnaround time, slowdown, utilization etc., there has been little research on how fair a nonpreemptive scheduling policy is. We propose an approach to assessing fairness in nonpreemptive job scheduling. Our basic model of fairness is that no later arriving job should delay an earlier arriving job. We quantitatively assess the fairness of several job scheduling strategies and propose a new strategy that seeks to improve fairness.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131328126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Gundlach, Sarah Doster, H. Yan, D. Lowenthal, S. Watterson, Surendar Chandra
{"title":"Dynamic, power-aware scheduling for mobile clients using a transparent proxy","authors":"M. Gundlach, Sarah Doster, H. Yan, D. Lowenthal, S. Watterson, Surendar Chandra","doi":"10.1109/ICPP.2004.1327966","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327966","url":null,"abstract":"Mobile computers consume significant amounts of energy when receiving large files. The wireless network interface card (WNIC) is the primary source of this energy consumption. One way to reduce the energy consumed is to transmit the packets to clients in a predictable fashion. Specifically, the packets can be sent in bursts to clients, who can then switch to a lower power sleep state between bursts. This technique is especially effective when the bandwidth of a stream is small. This work investigates techniques for saving energy in a multiple-client scenario, where clients may be receiving either UDP or TCP data. Energy is saved by using a transparent proxy that is invisible to both clients and servers. The proxy implementation maintains separate connections to the client and server so that a large increase in transmission time is avoided. The proxy also buffers data and dynamically generates a global transmission schedule that includes all active clients. Results show that energy savings within 10-15% of optimal are common, with little packet loss.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124761644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient and scalable all-to-all personalized exchange for InfiniBand-based clusters","authors":"S. Sur, Hyun-Wook Jin, D. Panda","doi":"10.1109/ICPP.2004.1327932","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327932","url":null,"abstract":"The all-to-all personalized exchange is the most dense collective communication function offered by the MPI specification. The operation involves every process sending a different message to all other participating processes. This collective operation is essential for many parallel scientific applications. With increasing system and message sizes, it becomes challenging to offer a fast, scalable and efficient implementation of this operation. InfiniBand is an emerging modern interconnect. It offers very low latency, high bandwidth and one-sided operations like RDMA write. Its advanced features like RDMA write gather allow us to design and implement all-to-all algorithms much more efficiently than in the past. Our aim in This work is to design efficient and scalable implementations of traditional personalized exchange algorithms. We present two novel approaches towards designing all-to-all algorithms for short and long messages respectively. The hypercube RDMA write gather and direct eager schemes effectively leverage the RDMA and RDMA with write gather mechanisms offered by InfiniBand. Performance evaluation of our design and implementation reveals that it is able to reduce the all-to-all communication time by upto a factor of 3.07 for 32 byte messages on a 16 node InfiniBand cluster. Our analytical models suggest that the proposed designs perform 64% better on InfiniBand clusters with 1024 nodes for 4k message size.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126559750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply","authors":"Benjamin C. Lee, R. Vuduc, J. Demmel, K. Yelick","doi":"10.1109/ICPP.2004.1327917","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327917","url":null,"abstract":"We present optimizations for sparse matrix-vector multiply SpMV and its generalization to multiple vectors, SpMM, when the matrix is symmetric: (1) symmetric storage, (2) register blocking, and (3) vector blocking. Combined with register blocking, symmetry saves more than 50% in matrix storage. We also show performance speedups of 2.1/spl times/ for SpMV and 2.6/spl times/ for SpMM, when compared to the best nonsymmetric register blocked implementation. We present an approach for the selection of tuning parameters, based on empirical modeling and search that consists of three steps: (1) Off-line benchmark, (2) Runtime search, and (3) Heuristic performance model. This approach generally selects parameters to achieve performance with 85% of that achieved with exhaustive search. We evaluate our implementations with respect to upper bounds on performance. Our model bounds performance by considering only the cost of memory operations and using lower bounds on the number of cache misses. Our optimized codes are within 68% of the upper bounds.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122847443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"POSE: getting over grainsize in parallel discrete event simulation","authors":"Terry Wilmarth, L. Kalé","doi":"10.1109/ICPP.2004.1327899","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327899","url":null,"abstract":"Parallel discrete event simulations (PDES) encompass a broad range of analytical simulations. Their utility lies in their ability to model a system and provide information about its behavior in a timely manner. Current PDES methods provide limited performance improvements over sequential simulation. Many logical models for applications have fine granularity making them challenging to parallelize. In POSE, we examine the overhead required for optimistically synchronizing events. We have designed an object model based on the concept of visualization and new adaptive optimistic methods to improve the performance of finegrained PDES applications. These novel approaches exploit the speculative nature of optimistic protocols to improve single-processor parallel over sequential performance and achieve scalability for previously hard-to-parallelize fine-grained simulations.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122905364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal parallel scheduling algorithm for WDM optical interconnects with recirculating buffering","authors":"Zhenghao Zhang, Yuanyuan Yang","doi":"10.1109/ICPP.2004.1327955","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327955","url":null,"abstract":"We study scheduling algorithms in WDM optical interconnects with recirculating buffering. The interconnect we consider has wavelength conversion capabilities. We focus on limited range wavelength conversion while considering full range wavelength conversion as a special case. We formalize the problem of maximizing throughput and minimizing packet delay in such an interconnect as a matching problem in a bipartite graph and give an optimal parallel algorithm that runs in O(Bk/sup 2/), as compared to O((N+B)/sup 3/k/sup 3/) time if directly applying other existing matching algorithms, where N is the number of input/output fibers, B is the number of fiber delay lines and k is the number of wavelengths per fiber.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"108 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132801725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Packet size optimization for supporting coarse-grained pipelined parallelism","authors":"Wei Du, G. Agrawal","doi":"10.1109/ICPP.2004.1327929","DOIUrl":"https://doi.org/10.1109/ICPP.2004.1327929","url":null,"abstract":"The emergence of grid and a new class of data-driven applications is making a new form of parallelism desirable, which we refer to as coarse-grained pipelined parallelism. We focus on the problem of choosing packet size, i.e., the unit of transfer between the pipeline units, in exploiting this form of parallelism. We develop an analytical model for this purpose. Because the pipeline includes both communication and computation phases, the frequency and/or volume of communication between different phases can be different. We consider two models, fixed-frequency and fixed-size, and derive mathematical expressions for both. We have carried out detailed evaluation of our models using three applications, executed with different parameters and datasets. Our experiments show that the choice of packet size makes a significant difference in the execution time, and the packet sizes suggested by the model result in the lowest or very close to the lowest possible execution time.","PeriodicalId":106240,"journal":{"name":"International Conference on Parallel Processing, 2004. ICPP 2004.","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124269582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}