{"title":"Improving Application Concurrency on GPUs by Managing Implicit and Explicit Synchronizations","authors":"M. Butler, Kittisak Sajjapongse, M. Becchi","doi":"10.1109/ICPADS.2015.73","DOIUrl":"https://doi.org/10.1109/ICPADS.2015.73","url":null,"abstract":"Originally designed to be used as dedicated coprocessors, GPUs have progressively become part of shared computing environments, such as HPC servers and clusters. Commonly used GPU software stacks (e.g., CUDA and OpenCL), however, are designed for the dedicated use of GPUs by a single application, possibly leading to resource underutilization when multiple applications share the GPU resources. In recent years, several node-level runtime components have been proposed to target this problem and allow the efficient sharing of GPUs among concurrent applications. The concurrency enabled by these systems, however, is limited by synchronizations embedded in the applications or implicitly introduced by the GPU software stack. This work targets this problem. We first analyze the effect of explicit and implicit synchronizations on application concurrency and GPU utilization. We then design runtime mechanisms to bypass these synchronizations, along with a memory management scheme that can be integrated with these synchronization avoidance mechanisms to improve GPU utilization and system throughput. We integrate these mechanisms into a recently proposed GPU virtualization runtime named Sync-Free GPU (SF-GPU), thus removing unnecessary blockages caused by multitenancy, ensuring any two applications running on the same device experience limited to no interference, maximizing the level of concurrency supported. We also release our mechanisms in the form of a software API that can be used by programmers to improve the performance of their applications without modifying their code. Finally, we evaluate the impact of our proposed mechanisms on applications run in isolation and concurrently.","PeriodicalId":231517,"journal":{"name":"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131102816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PEVTS: Privacy-Preserving Electric Vehicles Test-Bedding Scheme","authors":"Xu Yang, Joseph K. Liu, Wei Wu, M. Au, W. Susilo","doi":"10.1109/ICPADS.2015.43","DOIUrl":"https://doi.org/10.1109/ICPADS.2015.43","url":null,"abstract":"Electric Vehicle (EV) infrastructure is relatively new in many countries. Due to the recency of an EV infrastructure, it is important to carry out a series of testing programs. Furthermore, authenticity for collection of data is necessary for testing programs in order to provide accurate results. At the same time, user privacy should not cease since tracing one's daily logistic movements or behaviour from the EV testing programs means breaching one's privacy. In this paper, we propose a novel solution PEVTS for enabling both data authenticity and user privacy concurrently. Our proposed system provides great flexibility to the authority to choose any arbitrary set of authenticated users for testing in every time period. At the same time, it provides anonymity for all participating users. Yet it can trace any vehicle within a time period for statistical purpose. We give a detailed description of our system. We also implement the prototype of our system to show its practicality.","PeriodicalId":231517,"journal":{"name":"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127826907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Spichkova, Ian E. Thomas, H. Schmidt, I. I. Yusuf, D. Drumm, S. Androulakis, G. Opletal, S. Russo
{"title":"Scalable and Fault-Tolerant Cloud Computations: Modelling and Implementation","authors":"M. Spichkova, Ian E. Thomas, H. Schmidt, I. I. Yusuf, D. Drumm, S. Androulakis, G. Opletal, S. Russo","doi":"10.1109/ICPADS.2015.57","DOIUrl":"https://doi.org/10.1109/ICPADS.2015.57","url":null,"abstract":"This paper presents a formal model for science clouds, capable of predicting and controlling resources scalably, as well as its implementation as an open source solution, called Chiminey. The feasibility of Chiminey is shown using case studies on biophysics and structural chemistry computations. Big data is acquired from scientific instruments such as synchrotrons and atomic force microscopes. The model takes into account the architecture of the overall parallel and distributed system including large-scale data sources; data sinks, for example petabyte research data stores; and cluster or cloud virtual resources and infrastructures characterised by users in simple parameters upfront. Chiminey is developed to control large numbers of processes and to provide a reliable computing and data management, which can be used by researchers without having to learn extensive infrastructure concepts and technologies.","PeriodicalId":231517,"journal":{"name":"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131819029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-Aware Caching","authors":"Wei Zhang, Rui Fan, Fang Liu, Pan Lai","doi":"10.1109/ICPADS.2015.66","DOIUrl":"https://doi.org/10.1109/ICPADS.2015.66","url":null,"abstract":"To achieve higher performance, cache sizes have been steadily increasing in computer processors and network systems. But caches are often over-provisioned for peak demand and underutilized in typical non-peak workloads. As caches consume substantial power, this results in significant amounts of wasted energy. To address this, existing works turn off parts of the cache when they do not contribute to higher performance. However, while these methods are effective empirically, they lack provable performance bounds. In addition, existing works focus on processor caches and are not applicable to network caches where data size and cost can vary. In this paper, we study the energy-aware caching (EAC) problem, and seek to minimize the total cost incurred due to cache misses and energy consumption. We propose three algorithms to solve different variants of this problem. The first is an optimal offline algorithm that runs in O(kn log n) time for a size k cache and n cache accesses. Then, we propose a simple online algorithm for uniform data size and cost that is 2 + h/(h-h+1 competitive compared to an optimal algorithm with a size h ≤ k cache. Lastly, we propose a 2 + h-1/(h-h+1) competitive online algorithm that allows arbitrary data sizes and costs. We give an efficient implementation of the algorithm that takes O(log k) amortized time per cache access, and also present an adaptive version that reacts to workload patterns to achieve better real-world performance. Using trace driven simulations, we show our algorithm has substantially lower cost than algorithms focused on maximizing cache hit rates or minimizing energy usage alone.","PeriodicalId":231517,"journal":{"name":"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131870695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Resource Provision for Batch and Interactive Workloads in Data Centers","authors":"Ting-Wei Chang, Ching-Chi Lin, Pangfeng Liu, Jan-Jan Wu, Chia-Chun Shih, Chao-Wen Huang","doi":"10.1109/ICPADS.2015.60","DOIUrl":"https://doi.org/10.1109/ICPADS.2015.60","url":null,"abstract":"In this paper we describe a scheduling framework that allocates resources to both batch jobs and interactive jobs simultaneously in a private cloud with a static amount of resources. In the system, every job has an individual service level agreement (SLA), and violating the SLA incurs penalty. We propose a model to formally quantify the SLA violation penalty of both batch and interactive jobs. The analysis on the interactive jobs focuses on queuing analysis and response time. The analysis on batch jobs focuses on the non-preemptive job scheduling for multiple processing units. Based on this model we also propose algorithms to estimate the penalty for both batch jobs and interactive jobs, and algorithms that reduce the total SLA violation penalty. Our experiment results suggest that our system effectively reduces the total penalty by allocating the right amount of resources to heterogeneous jobs in a private cloud system.","PeriodicalId":231517,"journal":{"name":"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114459118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ananta Tiwari, Adam Jundt, W. A. Ward, R. Campbell, L. Carrington
{"title":"Building Blocks for a System-Wide Power and Thermal Management Framework","authors":"Ananta Tiwari, Adam Jundt, W. A. Ward, R. Campbell, L. Carrington","doi":"10.1109/ICPADS.2015.93","DOIUrl":"https://doi.org/10.1109/ICPADS.2015.93","url":null,"abstract":"Next generation Exascale systems face the difficult challenge of managing the power and thermal constraints that come from packaging more transistors into a smaller space while adding more processors into a single system. To combat this, HPC center operators are looking for methodologies to save operational energy. Energy consumption in an HPC center is governed by the complex interactions between a number of different components. Without a coordinated and system-wide perspective on reducing energy consumption, isolated actions taken on one component with the intent to lower energy consumption can actually have the opposite effect on another component, thereby canceling out the net effect. For example, increasing the setpoint (or ambient temperature) to save cooling energy can lead to increased compute-node fan power and increased chip leakage power. This paper presents the building blocks required to develop and implement a system-wide framework that can take a coordinated approach to enact thermal and power management decisions at compute-node (e.g., CPU speed throttling) and infrastructure levels (e.g., selecting optimal setpoint). These building blocks consist of a suite of models that inform the thermal and power footprint of different computations, and present relationships between computational properties and datacenter operating conditions.","PeriodicalId":231517,"journal":{"name":"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115124407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conceptual Survey on Data Stream Processing Systems","authors":"Guenter Hesse, M. Lorenz","doi":"10.1109/ICPADS.2015.106","DOIUrl":"https://doi.org/10.1109/ICPADS.2015.106","url":null,"abstract":"The present paper gives an overview about the state of the art technology within the area of data stream processing systems. Although the area of stream processing systems is not new, it is receiving a greater interest in the light of current business trends like the Internet of Things (IoT). The comparison of systems thereby includes several aspects such as a look into their architectures as well as into the responsibilities of the corresponding system components. A ranking or recommendations for one or more system(s) is not part of the work.","PeriodicalId":231517,"journal":{"name":"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125823206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Decentralized Dynamic Participation in Participatory Sensing: A Correlated-Equilibrium Game Approach","authors":"Yanmin Zhu, Liqun Huang, Yao Sun","doi":"10.1109/ICPADS.2015.40","DOIUrl":"https://doi.org/10.1109/ICPADS.2015.40","url":null,"abstract":"Participatory sensing has become a compelling and viable paradigm for large scale sensing data collection in recent years, with smartphone workers recruited from a public crowd. Since the quality of the sensing data is directly related to the performance of the workers, it is then crucial to enable effective participation for a large number of workers. In this paper, we propose a decentralized dynamic participation game framework for participatory sensing systems with heterogeneous sensing processes and smartphone workers. Based on economic studies, we characterize the behaviour of workers with the idea of correlated equilibrium. We propose a regret matching based participation algorithm to track the set of correlated equilibria in a distributed manner, where the knowledge of competitors' preference information is not required. Simulation results demonstrate that our algorithm achieves good convergence and effective participation.","PeriodicalId":231517,"journal":{"name":"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115872715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Analytical Models to Bootstrap Machine Learning Performance Predictors","authors":"Diego Didona, P. Romano","doi":"10.1109/ICPADS.2015.58","DOIUrl":"https://doi.org/10.1109/ICPADS.2015.58","url":null,"abstract":"Performance modeling is a crucial technique to enable the vision of elastic computing in cloud environments. Conventional approaches to performance modeling rely on two antithetic methodologies: white box modeling, which exploits knowledge on system's internals and capture its dynamics using analytical approaches, and black box techniques, which infer relations among the input and output variables of a system based on the evidences gathered during an initial training phase. In this paper we investigate a technique, which we name Bootstrapping, which aims at reconciling these two methodologies and at compensating the cons of the one with the pros of the other. We analyze the design space of this gray box modeling technique, and identify a number of algorithmic and parametric trade-offs which we evaluate via two realistic case studies, a Key-Value Store and a Total Order Broadcast service.","PeriodicalId":231517,"journal":{"name":"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124041698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianwei Chen, Huadong Ma, David S. L. Wei, Dong Zhao
{"title":"Participant-Density-Aware Privacy-Preserving Aggregate Statistics for Mobile Crowd-Sensing","authors":"Jianwei Chen, Huadong Ma, David S. L. Wei, Dong Zhao","doi":"10.1109/ICPADS.2015.26","DOIUrl":"https://doi.org/10.1109/ICPADS.2015.26","url":null,"abstract":"Mobile crowd-sensing applications produce useful knowledge of the surrounding environment, which makes our life more predictable. However, these applications often require people to contribute, consciously or unconsciously, location-related data for analysis, and this gravely encroaches users' location privacy. Aggregate processing is a feasible way for preserving user privacy to some extent, and based on the mode, some privacy-preserving schemes have been proposed. However, existing schemes still cannot guarantee users' location privacy in the scenarios with low density participants. Meanwhile, user accountability also needs to be considered comprehensively to protect the system from malicious users. In this paper, we propose a participant-density-aware privacy-preserving aggregate statistics scheme for mobile crowd-sensing applications. In our scheme, we make use of multi-pseudonym mechanism to overcome the vulnerability due to low participant density. To further handle sybil attacks, based on the Paillier cryptosystem and non-interactive zero-knowledge verification, we advance and improve our solution framework, which also covers the problem of user accountability. Finally, the theoretical analysis indicates that our scheme achieves the desired properties, and the performance experiments demonstrate that our scheme can achieve a balance among accuracy, privacy-protection and computational overhead.","PeriodicalId":231517,"journal":{"name":"2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127632103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}