{"title":"Cost Minimization for Scheduling Parallel, Single-Threaded, Heterogeneous, Speed-Scalable Processors","authors":"Rashid Khogali, O. Das","doi":"10.1109/ICPADS.2013.46","DOIUrl":"https://doi.org/10.1109/ICPADS.2013.46","url":null,"abstract":"We introduce an online scheduling algorithm to optimally assign a set of arriving heterogeneous tasks to heterogeneous speed-scalable processors. The goal of our algorithm is to minimize the total cost of response time and energy consumption (TCRTEC) of the tasks. We have three contributions that constitute the algorithm. First, we propose a novel task dispatching strategy for assigning the tasks to the processors. Second, we propose a novel preemptive service discipline called Smallest remaining Computation Volume Per unit Price of response Time (SCVPPT) to schedule the tasks on the assigned processor. Third, we propose a dynamic speed-scaling function that explicitly determines the optimum processing rate of each task. In our work, the processors are heterogeneous in that they may differ in their hardware specifications with respect to maximum processing rate and power functions. Tasks are heterogeneous in terms of computation volume and processing requirements. We also consider that the unit price of response time for each task is heterogeneous. Each task's unit price of response time is allowed to differ because the user may be willing to pay higher/lower unit prices for certain tasks, thereby increasing/decreasing their optimum processing rates. In our SCVPPT discipline, a task's scheduling priority is influenced by its remaining computation volume as well as its unit price of response time. Our simulation results show that SCVPPT outperforms the two known service disciplines, Shortest Remaining Processing Time (SRPT) and the First Come First Serve (FCFS), in terms of minimizing the TCRTEC performance metric. The results also show that the algorithm's dispatcher outperforms the well known Round Robin dispatcher when the processors are heterogeneous. We focus on multi-buffer, single-threading where a set of tasks is allocated to a given processor, but only one task is processed at a time until completion unless preemption is dictated by the service discipline.","PeriodicalId":160979,"journal":{"name":"2013 International Conference on Parallel and Distributed Systems","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134151369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sifei Lu, Xiaorong Li, Long Wang, Henry Kasim, H. Palit, T. Hung, E. F. Legara, G. Lee
{"title":"A Dynamic Hybrid Resource Provisioning Approach for Running Large-Scale Computational Applications on Cloud Spot and On-Demand Instances","authors":"Sifei Lu, Xiaorong Li, Long Wang, Henry Kasim, H. Palit, T. Hung, E. F. Legara, G. Lee","doi":"10.1109/ICPADS.2013.117","DOIUrl":"https://doi.org/10.1109/ICPADS.2013.117","url":null,"abstract":"Testing and executing large-scale computational applications in public clouds is becoming prevalent due to cost saving, elasticity, and scalability. However, how to increase the reliability and reduce the cost to run large-scale applications in public clouds is still a big challenge. In this paper, we analyzed the pricing schemes of Amazon Elastic Compute Cloud (EC2) and found the disturbance effect that the price of the spot instances can be heavily affected due to the large number of spot instances required. We proposed a dynamic approach which schedules and runs large-scale computational applications on a dynamic pool of cloud computational instances. We use hybrid instances, including both on-demand instances for high priority tasks and backup, and spot instances for normal computational tasks so as to further reduce the cost without significantly increasing the completion time. Our proposed method takes the dynamic pricing of cloud instances into consideration, and it reduces the cost and tolerates the failures for running large-scale applications in public clouds. We conducted experimental tests and an agent based Scalable complex System modeling for Sustainable city (S3) application is used to evaluate the scalability, reliability and cost saving. The results show that our proposed method is robust and highly flexible for researchers and users to further reduce cost in real practice.","PeriodicalId":160979,"journal":{"name":"2013 International Conference on Parallel and Distributed Systems","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121369111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyun-Kyoo Park, Changmin Lee, Seung-Hun Kim, W. Ro, J. Gaudiot
{"title":"Mark-Sharing: A Parallel Garbage Collection Algorithm for Low Synchronization Overhead","authors":"Hyun-Kyoo Park, Changmin Lee, Seung-Hun Kim, W. Ro, J. Gaudiot","doi":"10.1109/ICPADS.2013.16","DOIUrl":"https://doi.org/10.1109/ICPADS.2013.16","url":null,"abstract":"Two main problems prevent a parallel garbage collection (GC) scheme with lock-based synchronization from providing a high level of scalability: the load imbalance and the runtime overhead of thread synchronization operations. These problems become even more serious as the number of available threads increases. We propose the Mark-Sharing algorithm to improve the performance of parallel GC using transactional memory (TM) systems. The Mark-Sharing algorithm guarantees that all threads access the shared resource by using both the task-stealing and task-releasing mechanisms appropriately. In addition, we introduce a selection manager that minimizes the contention and idle time of garbage collectors by maintaining task information. The proposed algorithm outperforms the prior pool-sharing algorithm of GC in the HTM, providing more than 90% performance improvement on average.","PeriodicalId":160979,"journal":{"name":"2013 International Conference on Parallel and Distributed Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129334432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nuclear Fusion Simulation Code Optimization on GPU Clusters","authors":"N. Fujita, Hideo Nuga, T. Boku, Y. Idomura","doi":"10.1109/ICPADS.2013.65","DOIUrl":"https://doi.org/10.1109/ICPADS.2013.65","url":null,"abstract":"GT5D is a nuclear fusion simulation program which aims to analyze the turbulence phenomena in tokamak plasma. In this research, we optimize it for GPU clusters with multiple GPUs on a node. Based on the profile result of GT5D on a CPU node, we decide to offload the whole of the time development part of the program to GPUs except MPI communication. We achieved 3.37 times faster performance in maximum in function level evaluation, and 2.03 times faster performance in total than the case of CPU-only execution, both in the measurement on high density GPU cluster HA-PACS where each computation node consists of four NVIDIA M2090 GPUs and two Intel Xeon E5-2670 (Sandy Bridge) to provide 16 cores in total. These performance improvements on single GPU corresponds to four CPU cores, not compared with a single CPU core. It includes 53% performance gain with overlapping the communication between MPI processes with GPU calculation.","PeriodicalId":160979,"journal":{"name":"2013 International Conference on Parallel and Distributed Systems","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116131287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Wu, Shangping Ren, G. Garzoglio, S. Timm, G. B. Altayo, Hyunwoo Kim, K. Chadwick, H. Jang, S. Noh
{"title":"Automatic Cloud Bursting under FermiCloud","authors":"Hao Wu, Shangping Ren, G. Garzoglio, S. Timm, G. B. Altayo, Hyunwoo Kim, K. Chadwick, H. Jang, S. Noh","doi":"10.1109/ICPADS.2013.121","DOIUrl":"https://doi.org/10.1109/ICPADS.2013.121","url":null,"abstract":"Cloud computing is changing the infrastructure upon which scientific computing depends from supercomputers and distributed computing clusters to a more elastic cloud-based structure. The service-oriented focus and elasticity of clouds can not only facilitate technology needs of emerging business but also shorten response time and reduce operational costs of traditional scientific applications. Fermi National Accelerator Laboratory (Fermilab) is currently in the process of building its own private cloud, FermiCloud, which allows the existing grid infrastructure to use dynamically provisioned resources on FermiCloud to accommodate increased but dynamic computation demand from scientists in the domains of High Energy Physics (HEP) and other research areas. Cloud infrastructure also allows to increase a private cloud's resource capacity through \"bursting\" by borrowing or renting resources from other community or commercial clouds when needed. This paper introduces a joint project on building a cloud federation to support HEP applications between Fermi National Accelerator Laboratory and Korea Institution of Science and Technology Information, with technical contributions from the Illinois Institute of Technology. In particular, this paper presents two recent accomplishments of the joint project: (a) cloud bursting automation and (b) load balancer. Automatic cloud bursting allows computer resources to be dynamically reconfigured to meet users' demands. The load balance algorithm which the cloud bursting depends on decides when and where new resources need to be allocated. Our preliminary prototyping and experiments have shown promising success, yet, they also have opened new challenges to be studied.","PeriodicalId":160979,"journal":{"name":"2013 International Conference on Parallel and Distributed Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115509876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Accurate Indoor-Localization Using Passive RFID","authors":"X. Chen, Lei Xie, Chuyu Wang, Sanglu Lu","doi":"10.1109/ICPADS.2013.44","DOIUrl":"https://doi.org/10.1109/ICPADS.2013.44","url":null,"abstract":"In many pervasive applications like the intelligent bookshelves in libraries, it is essential to accurately locate the items to provide the location-based service, e.g., the average localization error should be smaller than 50 cm and the localization delay should be within several seconds. Conventional indoor-localization schemes cannot provide such accurate localization results. In this paper, we design an adaptive, accurate indoor-localization scheme using passive RFID systems. We propose two adaptive solutions, i.e., the adaptive power stepping and the adaptive calibration, which can adaptively adjust the critical parameters and leverage the feedbacks to improve the localization accuracy. The realistic experiment results indicate that, our adaptive localization scheme can achieve an accuracy of 31 cm within 2.6 seconds on average.","PeriodicalId":160979,"journal":{"name":"2013 International Conference on Parallel and Distributed Systems","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127449313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuan-Ko Huang, Zong-Han He, Chiang Lee, Wu-Hsiu Kuo
{"title":"Continuous Possible K-Nearest Skyline Query in Euclidean Spaces","authors":"Yuan-Ko Huang, Zong-Han He, Chiang Lee, Wu-Hsiu Kuo","doi":"10.1109/ICPADS.2013.35","DOIUrl":"https://doi.org/10.1109/ICPADS.2013.35","url":null,"abstract":"Continuous K-nearest skyline query (CKNSQ) is an important type of the spatio-temporal queries. Given a query time interval [ts, te] and a moving query object q, a CKNSQ is to retrieve the K-nearest skyline points of q at each time instant within [ts, te]. Different from the previous works, our work devotes to overcoming the past assumption that each object is static with certain dimensional values and located in road networks. In this paper, we focus on processing the CKNSQ over moving objects with uncertain dimensional values in Euclidean space and the velocity of each object (including the query object) varies within a known range. Such a query is called the continuous possible K-nearest skyline query (CPKNSQ). We first discuss the difficulties raised by the uncertainty of object and then propose the CPKNSQ algorithm operated with a data partitioning index, called the uncertain TPR-tree (UTPR-tree), to efficiently answer the CPKNSQ.","PeriodicalId":160979,"journal":{"name":"2013 International Conference on Parallel and Distributed Systems","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125622390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guthemberg Silvestre, Sébastien Monnet, David Buffoni, Pierre Sens
{"title":"Predicting Popularity and Adapting Replication of Internet Videos for High-Quality Delivery","authors":"Guthemberg Silvestre, Sébastien Monnet, David Buffoni, Pierre Sens","doi":"10.1109/ICPADS.2013.64","DOIUrl":"https://doi.org/10.1109/ICPADS.2013.64","url":null,"abstract":"Content availability has become increasingly important for the Internet video delivery chain. To deliver videos with an outstanding availability and meet the increasing user expectations, content delivery networks (CDNs) must enforce strict QoS metrics, like bitrate and latency, through SLA contracts. Adaptive content replication has been seen as a promising way to achieve this goal. However, it remains unclear how to avoid waste of resources when strict SLA contracts must be enforced. In this work, we introduce Hermes, an adaptive replication scheme based on accurate predictions about the popularity of Internet videos. Simulations using popularity growth curves from YouTube traces suggest that our approach meets user expectations efficiently. Compared to a non-collaborative caching, Hermes reduces storage usage for replication by two orders of magnitude, and under heavy load conditions, it increases the average bitrate provision by roughly 90%. Moreover, it prevents SLA violations through an application-level deadline-aware mechanism.","PeriodicalId":160979,"journal":{"name":"2013 International Conference on Parallel and Distributed Systems","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126891075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MPI-Interoperable Generalized Active Messages","authors":"Xin Zhao, P. Balaji, W. Gropp, R. Thakur","doi":"10.1109/ICPADS.2013.38","DOIUrl":"https://doi.org/10.1109/ICPADS.2013.38","url":null,"abstract":"Data-intensive applications have become increasingly important in recent years, yet traditional data movement approaches for scientific computation are not well suited for such applications. The Active Message (AM) model is an alternative communication paradigm that is better suited for such applications by allowing computation to be dynamically moved closer to data. Given the wide usage of MPI in scientific computing, enabling an MPI-interoperable AM paradigm would allow traditional applications to incrementally start utilizing AMs in portions of their applications, thus eliminating the programming effort of rewriting entire applications. In our previous work, we extended the MPI ACCUMULATE and MPI GET ACCUMULATE operations in the MPI standard to support AMs. However, the semantics of accumulate-style AMs are fundamentally restricted by the semantics of MPI ACCUMULATE and MPI GET ACCUMULATE, which were not designed to support the AM model. In this paper, we present a new generalized framework for MPI-interoperable AMs that can alleviate those restrictions, thus providing a richer semantics to accommodate a wide variety of application computational patterns. Together with a new API, we present a detailed description of the correctness semantics of this functionality and a reference implementation that demonstrates how various API choices affect the flexibility provided to the MPI implementation and consequently its performance.","PeriodicalId":160979,"journal":{"name":"2013 International Conference on Parallel and Distributed Systems","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128029036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kyung-Ryul Kim, Jooik Jung, Sejin Chun, Gunhee Cho, Kyong-Ho Lee
{"title":"Estimating Web Service Reputation from Integrated Social Service Network Model","authors":"Kyung-Ryul Kim, Jooik Jung, Sejin Chun, Gunhee Cho, Kyong-Ho Lee","doi":"10.1109/.91","DOIUrl":"https://doi.org/10.1109/.91","url":null,"abstract":"Social networks facilitate information sharing and communication among people who have common interests. Although social networks have been widely used to compute the reputations of Web services, they are limited in the ability to support the sophisticated estimation of service reputations. This paper proposes a sophisticated method of computing service reputations, which considers service requesters, raters and providers participating in the service ecosystem. In order to represent the interactions among the service participants, we also propose an integrated model which combines social and service network models. Based on the model, the proposed method calculates the credibility and expertise of the raters and providers of a service and applies them to the reputation of the service. Experimental results with real-world Web services show that the proposed method computes service reputations elaborately.","PeriodicalId":160979,"journal":{"name":"2013 International Conference on Parallel and Distributed Systems","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124946238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}