Yiyang Zhao, Chen Qian, Liangyi Gong, Zhenhua Li, Yunhao Liu
{"title":"LMDD: Light-Weight Magnetic-Based Door Detection with Your Smartphone","authors":"Yiyang Zhao, Chen Qian, Liangyi Gong, Zhenhua Li, Yunhao Liu","doi":"10.1109/ICPP.2015.101","DOIUrl":"https://doi.org/10.1109/ICPP.2015.101","url":null,"abstract":"Doors are important landmarks for indoor positioning systems. Hence an accurate and light-weight door detection approach is highly desired. The state-of-the-art solutions are either vision based or infrastructure based, which incur nontrivial device or management cost. This paper presents a novel approach, Light-weight Magnetic-based Door Detection (LMDD), which only relies on the information from built-in sensors of a smartphone. LMDD detects a door by analyzing the change of magnetic signal and extracting special features caused by doors. It is light-weight in both computation and infrastructure cost. We have implemented a prototype of LMDD that has been installed on various Android phones. Experimental results show that LMDD achieves door detection accuracy of 74% in average, ranging from 66% to 85% in various typical environments such as offices, classrooms, residential houses, and a hospital.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127772983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Executing Large Scale Scientific Workflow Ensembles in Public Clouds","authors":"Qingye Jiang, Young Choon Lee, Albert Y. Zomaya","doi":"10.1109/ICPP.2015.61","DOIUrl":"https://doi.org/10.1109/ICPP.2015.61","url":null,"abstract":"Scientists in different fields, such as high energy physics, earth science, and astronomy are developing large-scale workflow applications. In many use cases, scientists need to run a set of interrelated but independent workflows (i.e., Workflow ensembles) for the entire scientific analysis. As a workflow ensemble usually contains many sub-workflows in each of which hundreds or thousands of jobs exist with precedence constraints, the execution of such a workflow ensemble makes a great concern with cost even using elastic and pay-as-you-go cloud resources. In this paper, we address two main challenges in executing large-scale workflow ensembles in public clouds with both cost and deadline constraints: (1) execution coordination, and (2) resource provisioning. To this end, we develop a new pulling based workflow execution system with a profiling-based resource provisioning strategy. The idea is homogeneity in both scientific workflows and cloud resources can be exploited to remove scheduling overhead (in execution coordination) and to minimize cost meeting deadline. Our results show that our solution system can achieve 80% speed-up, by removing scheduling overhead, compared to the well-known Pegasus workflow management system when running scientific workflow ensembles. Besides, our evaluation using Montage (an astronomical image mosaic engine) workflow ensembles on around 1000-core Amazon EC2 clusters has demonstrated the efficacy of our resource provisioning strategy in terms of cost effectiveness within deadline.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124427483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingkai Huang, Dan He, Xianhua Liu, Mingxing Tan, Xu Cheng
{"title":"An Energy-Efficient Branch Prediction with Grouped Global History","authors":"Mingkai Huang, Dan He, Xianhua Liu, Mingxing Tan, Xu Cheng","doi":"10.1109/ICPP.2015.23","DOIUrl":"https://doi.org/10.1109/ICPP.2015.23","url":null,"abstract":"Branch prediction has been playing an increasingly important role in improving the performance and energy efficiency for modern microprocessors. The state-of-the-art branch predictors, such as the perceptron and TAGE predictors, leverage novel prediction algorithms to explore longer branch history for higher prediction accuracy. We observe that as the branch history is becoming longer, the efficiency of global history is degraded by the interference of different branch instructions. In order to mitigate the excessive influence of the branch history interference, we propose the Grouped Global History (GGH) based branch predictor, a lightweight yet efficient branch predictor. Unlike existing branch predictors that make use of a unified global history for prediction, GGH divides the global history into a set of subgroups such that the interference resulted by frequently executed branch instructions could be restricted. With subgroups of global history, GGH also enables us to track even longer effective branch correlation without introducing hardware storage overhead. Our experimental results based on SPEC CINT 2006 workloads demonstrate that our approach can significantly reduce the branch mispredictions per kilo instructions (MPKI) by 4.76 over the baseline perceptron predictor, with a simple control logic extension.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132614767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DPX10: An Efficient X10 Framework for Dynamic Programming Applications","authors":"Chen Wang, Ce Yu, Ji-zhou Sun, X. Meng","doi":"10.1109/ICPP.2015.96","DOIUrl":"https://doi.org/10.1109/ICPP.2015.96","url":null,"abstract":"X10 language and Asynchronous Partitioned Global Address Space (APGAS) model is an emerging mechanism for programming high-performance computers and commodity clusters. However, little work exists on distributed programming framework for dynamic programming (DP) problems based on X10 and APGAS model. In this paper we present DPX10, an efficient distributed X10 framework for DP applications. DPX10 enables developers to write highly efficient DP programs without much effort. A DPX10 program is specified by a directed acyclic graph (DAG) pattern and a compute method for the vertices. DPX10 provides eight commonly used DAG patterns and a simple API to create custom patterns. The system handles all the tiresome work of implementing parallelization including DAG distribution, vertices scheduling, and vertices communication. Moreover, a new recovery method for distributed arrays is developed to provide transparent fault tolerance. We describe the design of the framework and use four DP applications with up to a billion vertices on 120 cores to demonstrate its simplicity, efficiency, and scalability.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128394689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shiyi Li, Q. Cao, Shenggang Wan, Wenhui Zhang, C. Xie, Xubin He, P. Subedi
{"title":"PPM: A Partitioned and Parallel Matrix Algorithm to Accelerate Encoding/Decoding Process of Asymmetric Parity Erasure Codes","authors":"Shiyi Li, Q. Cao, Shenggang Wan, Wenhui Zhang, C. Xie, Xubin He, P. Subedi","doi":"10.1109/ICPP.2015.55","DOIUrl":"https://doi.org/10.1109/ICPP.2015.55","url":null,"abstract":"Erasure codes are widely deployed in storage systems and the encoding/decoding process is a common operation in erasure-coded systems. Parity-check matrix method is a general method employed in erasure codes to conduct encoding/decoding process. However, the process is serial and generates high computational cost in dealing with matrix operations, and hence, causes low encoding/decoding performance. Especially for some recently proposed erasure codes, including SD code, PMDS code, and LRC code, the disadvantages are more obvious. To address this issue, in this paper, we present an optimization algorithm, called Partitioned and Parallel Matrix (PPM) algorithm, to accelerate the encoding/decoding processes of these codes by partitioning the parity-check matrix, parallelizing the encoding/decoding operations, and optimizing the calculation sequence, so as to achieve the goal of fast encoding/decoding. Experimental results show that PPM can speed up the encoding/decoding process of these codes by up to 210.81%.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131923430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Wang, Allan Porterfield, John Cavazos, Sridutt Bhalachandra
{"title":"Using Per-Loop CPU Clock Modulation for Energy Efficiency in OpenMP Applications","authors":"Wei Wang, Allan Porterfield, John Cavazos, Sridutt Bhalachandra","doi":"10.1109/ICPP.2015.72","DOIUrl":"https://doi.org/10.1109/ICPP.2015.72","url":null,"abstract":"As the HPC community moves into the exascale computing era, application energy is becoming as large of a concern as performance. Optimizing for energy will be essential in the effort to overcome the limited power envelope. Existing efforts to optimize energy in applications employ Dynamic Frequency and Voltage Scaling (DVFS) to maximize energy savings in less compute-intensive regions or non-critical execution paths. However, we found that DVFS has high power state switching overhead, preventing its use when a more fine-grained technique is necessary. In this work, we take advantage of the low transition overhead of CPU clock modulation and apply it to fine-grained Open MP parallel loops. The energy behavior of Open MP parallel regions is first characterized by changing the effective frequency using clock modulation. The clock modulation setting that achieves the best energy efficiency is then determined for each region. Finally, different CPU clock modulation settings are applied to the different loops within the same application. The resulting multi-frequency execution of Open MP applications achieves better energy-delay trade-off than any single frequency setting. In the best case scenario, the multi-frequency approach achieved 8.6% energy savings with less than 1.5% execution time increase. Concurrency throttling (i.e., Reducing the number of hardware threads used by an application) saves more energy and can be combined with CPU clock modulation. Using both, we see savings of 21% energy and improvement of energy-delay product (EDP) by 16%.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124568502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-Efficient and Delay-Constrained Broadcast in Time-Varying Energy-Demand Graphs","authors":"Chenxi Qiu, Haiying Shen, Lei Yu","doi":"10.1109/ICPP.2015.28","DOIUrl":"https://doi.org/10.1109/ICPP.2015.28","url":null,"abstract":"In this paper, we study the minimum energy broadcast problem in time-varying graphs (TVGs), which are a very useful high level abstraction for studying highly dynamic wireless networks. To this end, we first incorporate a channel model, called energy-demand functions, to the current TVGs, namely time-varying energy-demand graphs (TVEGs). Based on this model, we formulate the problem: given a TVEG, what is the optimal schedule (i.e., Which nodes should forward a packet in what times and at what power levels) to minimize the energy consumption of the broadcast? We prove the problem to be NP-hard and o(log N) in approximable. It is a challenge to find a solution for this problem on continuous time. Fortunately, we prove that the problem on continuous time is equivalent to the problem on certain discrete time points, called discrete time set (DTS). Based on this property, we propose polynomial time solutions for this problem with different channel models, and evaluate the performance of these methods from real-life contact traces.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121248308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Testing Engine for High-Performance and Cost-Effective Workflow Execution in the Cloud","authors":"V. Pallipuram, Trilce Estrada, M. Taufer","doi":"10.1109/ICPP.2015.94","DOIUrl":"https://doi.org/10.1109/ICPP.2015.94","url":null,"abstract":"While pursuing high performance and cost effectiveness for directed acyclic graph (DAG)-structured scientific workflow executions in the cloud, it is critical to identify appropriate resource instances and their quantity. This paper presents a testing engine that employs a resource-selection heuristic, which statically analyzes the DAG structure to guide the selection of resource instances, how many and which ones. The testing engine combines the heuristic with two platform-independent DAG-scheduling policies, the Area-oriented DAG-scheduling heuristic (AO) and the Locally-Optimal heuristic (L-OPT), to perform extensive validation assessments. The testing engine ensures the realism of these assessments by modeling the performance variability of the cloud platform using real traces. The testing engine also enables cost-effectiveness analysis that guides users to select a small set of instance candidates that provide performance-cost trade off. Our empirical results show that the pairing of the resource-selection heuristic with AO scheduling policy is a powerful method for cost-effective DAG-structured workflow execution in the cloud.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115371644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Nelson, Axel Rivera, Prasanna Balaprakash, Mary W. Hall, P. Hovland, E. Jessup, B. Norris
{"title":"Generating Efficient Tensor Contractions for GPUs","authors":"T. Nelson, Axel Rivera, Prasanna Balaprakash, Mary W. Hall, P. Hovland, E. Jessup, B. Norris","doi":"10.1109/ICPP.2015.106","DOIUrl":"https://doi.org/10.1109/ICPP.2015.106","url":null,"abstract":"Many scientific and numerical applications, including quantum chemistry modeling and fluid dynamics simulation, require tensor product and tensor contraction evaluation. Tensor computations are characterized by arrays with numerous dimensions, inherent parallelism, moderate data reuse and many degrees of freedom in the order in which to perform the computation. The best-performing implementation is heavily dependent on the tensor dimensionality and the target architecture. In this paper, we map tensor computations to GPUs, starting with a high-level tensor input language and producing efficient CUDA code as output. Our approach is to combine tensor-specific mathematical transformations with a GPU decision algorithm, machine learning and auto tuning of a large parameter space. Generated code shows significant performance gains over sequential and Open MP parallel code, and a comparison with Open ACC shows the importance of auto tuning and other optimizations in our framework for achieving efficient results.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"35 13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131479796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuo Yang, Fan Wu, Shaojie Tang, Xiaofeng Gao, Bo Yang, Guihai Chen
{"title":"Good Work Deserves Good Pay: A Quality-Based Surplus Sharing Method for Participatory Sensing","authors":"Shuo Yang, Fan Wu, Shaojie Tang, Xiaofeng Gao, Bo Yang, Guihai Chen","doi":"10.1109/ICPP.2015.47","DOIUrl":"https://doi.org/10.1109/ICPP.2015.47","url":null,"abstract":"Participatory sensing has become a novel and promising paradigm in environmental data collection. However, the issue of data quality has not been carefully addressed. Low quality data contributions may undermine the effectiveness and prospects of participatory sensing, and thus motivates the need for approaches to guarantee the high quality of the contributed data. In this paper, we integrate quality estimation and monetary incentive, and propose a quality-based surplus sharing method for participatory sensing. Specifically, we design an unsupervised learning approach to quantify the users' data qualities and long-term reputations, and exploit an outlier detection technique to filter out anomalous data items. Furthermore, we model the process of surplus sharing as a cooperative game, and propose a Shapley value-based method to determine each user's payment. We have conducted a participatory sensing experiment, and the experiment results show that our approach achieves good performance in terms of both quality estimation and surplus sharing.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130451644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}