2015 44th International Conference on Parallel Processing最新文献

筛选
英文 中文
LMDD: Light-Weight Magnetic-Based Door Detection with Your Smartphone LMDD:智能手机上的轻型磁性门检测
2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.101
Yiyang Zhao, Chen Qian, Liangyi Gong, Zhenhua Li, Yunhao Liu
{"title":"LMDD: Light-Weight Magnetic-Based Door Detection with Your Smartphone","authors":"Yiyang Zhao, Chen Qian, Liangyi Gong, Zhenhua Li, Yunhao Liu","doi":"10.1109/ICPP.2015.101","DOIUrl":"https://doi.org/10.1109/ICPP.2015.101","url":null,"abstract":"Doors are important landmarks for indoor positioning systems. Hence an accurate and light-weight door detection approach is highly desired. The state-of-the-art solutions are either vision based or infrastructure based, which incur nontrivial device or management cost. This paper presents a novel approach, Light-weight Magnetic-based Door Detection (LMDD), which only relies on the information from built-in sensors of a smartphone. LMDD detects a door by analyzing the change of magnetic signal and extracting special features caused by doors. It is light-weight in both computation and infrastructure cost. We have implemented a prototype of LMDD that has been installed on various Android phones. Experimental results show that LMDD achieves door detection accuracy of 74% in average, ranging from 66% to 85% in various typical environments such as offices, classrooms, residential houses, and a hospital.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127772983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Executing Large Scale Scientific Workflow Ensembles in Public Clouds 在公共云中执行大规模科学工作流集成
2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.61
Qingye Jiang, Young Choon Lee, Albert Y. Zomaya
{"title":"Executing Large Scale Scientific Workflow Ensembles in Public Clouds","authors":"Qingye Jiang, Young Choon Lee, Albert Y. Zomaya","doi":"10.1109/ICPP.2015.61","DOIUrl":"https://doi.org/10.1109/ICPP.2015.61","url":null,"abstract":"Scientists in different fields, such as high energy physics, earth science, and astronomy are developing large-scale workflow applications. In many use cases, scientists need to run a set of interrelated but independent workflows (i.e., Workflow ensembles) for the entire scientific analysis. As a workflow ensemble usually contains many sub-workflows in each of which hundreds or thousands of jobs exist with precedence constraints, the execution of such a workflow ensemble makes a great concern with cost even using elastic and pay-as-you-go cloud resources. In this paper, we address two main challenges in executing large-scale workflow ensembles in public clouds with both cost and deadline constraints: (1) execution coordination, and (2) resource provisioning. To this end, we develop a new pulling based workflow execution system with a profiling-based resource provisioning strategy. The idea is homogeneity in both scientific workflows and cloud resources can be exploited to remove scheduling overhead (in execution coordination) and to minimize cost meeting deadline. Our results show that our solution system can achieve 80% speed-up, by removing scheduling overhead, compared to the well-known Pegasus workflow management system when running scientific workflow ensembles. Besides, our evaluation using Montage (an astronomical image mosaic engine) workflow ensembles on around 1000-core Amazon EC2 clusters has demonstrated the efficacy of our resource provisioning strategy in terms of cost effectiveness within deadline.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124427483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
An Energy-Efficient Branch Prediction with Grouped Global History 具有分组全局历史的节能分支预测
2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.23
Mingkai Huang, Dan He, Xianhua Liu, Mingxing Tan, Xu Cheng
{"title":"An Energy-Efficient Branch Prediction with Grouped Global History","authors":"Mingkai Huang, Dan He, Xianhua Liu, Mingxing Tan, Xu Cheng","doi":"10.1109/ICPP.2015.23","DOIUrl":"https://doi.org/10.1109/ICPP.2015.23","url":null,"abstract":"Branch prediction has been playing an increasingly important role in improving the performance and energy efficiency for modern microprocessors. The state-of-the-art branch predictors, such as the perceptron and TAGE predictors, leverage novel prediction algorithms to explore longer branch history for higher prediction accuracy. We observe that as the branch history is becoming longer, the efficiency of global history is degraded by the interference of different branch instructions. In order to mitigate the excessive influence of the branch history interference, we propose the Grouped Global History (GGH) based branch predictor, a lightweight yet efficient branch predictor. Unlike existing branch predictors that make use of a unified global history for prediction, GGH divides the global history into a set of subgroups such that the interference resulted by frequently executed branch instructions could be restricted. With subgroups of global history, GGH also enables us to track even longer effective branch correlation without introducing hardware storage overhead. Our experimental results based on SPEC CINT 2006 workloads demonstrate that our approach can significantly reduce the branch mispredictions per kilo instructions (MPKI) by 4.76 over the baseline perceptron predictor, with a simple control logic extension.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132614767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
DPX10: An Efficient X10 Framework for Dynamic Programming Applications DPX10:动态规划应用程序的高效X10框架
2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.96
Chen Wang, Ce Yu, Ji-zhou Sun, X. Meng
{"title":"DPX10: An Efficient X10 Framework for Dynamic Programming Applications","authors":"Chen Wang, Ce Yu, Ji-zhou Sun, X. Meng","doi":"10.1109/ICPP.2015.96","DOIUrl":"https://doi.org/10.1109/ICPP.2015.96","url":null,"abstract":"X10 language and Asynchronous Partitioned Global Address Space (APGAS) model is an emerging mechanism for programming high-performance computers and commodity clusters. However, little work exists on distributed programming framework for dynamic programming (DP) problems based on X10 and APGAS model. In this paper we present DPX10, an efficient distributed X10 framework for DP applications. DPX10 enables developers to write highly efficient DP programs without much effort. A DPX10 program is specified by a directed acyclic graph (DAG) pattern and a compute method for the vertices. DPX10 provides eight commonly used DAG patterns and a simple API to create custom patterns. The system handles all the tiresome work of implementing parallelization including DAG distribution, vertices scheduling, and vertices communication. Moreover, a new recovery method for distributed arrays is developed to provide transparent fault tolerance. We describe the design of the framework and use four DP applications with up to a billion vertices on 120 cores to demonstrate its simplicity, efficiency, and scalability.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128394689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
PPM: A Partitioned and Parallel Matrix Algorithm to Accelerate Encoding/Decoding Process of Asymmetric Parity Erasure Codes PPM:一种加速非对称奇偶校验纠删码编解码过程的分割并行矩阵算法
2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.55
Shiyi Li, Q. Cao, Shenggang Wan, Wenhui Zhang, C. Xie, Xubin He, P. Subedi
{"title":"PPM: A Partitioned and Parallel Matrix Algorithm to Accelerate Encoding/Decoding Process of Asymmetric Parity Erasure Codes","authors":"Shiyi Li, Q. Cao, Shenggang Wan, Wenhui Zhang, C. Xie, Xubin He, P. Subedi","doi":"10.1109/ICPP.2015.55","DOIUrl":"https://doi.org/10.1109/ICPP.2015.55","url":null,"abstract":"Erasure codes are widely deployed in storage systems and the encoding/decoding process is a common operation in erasure-coded systems. Parity-check matrix method is a general method employed in erasure codes to conduct encoding/decoding process. However, the process is serial and generates high computational cost in dealing with matrix operations, and hence, causes low encoding/decoding performance. Especially for some recently proposed erasure codes, including SD code, PMDS code, and LRC code, the disadvantages are more obvious. To address this issue, in this paper, we present an optimization algorithm, called Partitioned and Parallel Matrix (PPM) algorithm, to accelerate the encoding/decoding processes of these codes by partitioning the parity-check matrix, parallelizing the encoding/decoding operations, and optimizing the calculation sequence, so as to achieve the goal of fast encoding/decoding. Experimental results show that PPM can speed up the encoding/decoding process of these codes by up to 210.81%.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131923430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Using Per-Loop CPU Clock Modulation for Energy Efficiency in OpenMP Applications 在OpenMP应用中使用单循环CPU时钟调制提高能源效率
2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.72
Wei Wang, Allan Porterfield, John Cavazos, Sridutt Bhalachandra
{"title":"Using Per-Loop CPU Clock Modulation for Energy Efficiency in OpenMP Applications","authors":"Wei Wang, Allan Porterfield, John Cavazos, Sridutt Bhalachandra","doi":"10.1109/ICPP.2015.72","DOIUrl":"https://doi.org/10.1109/ICPP.2015.72","url":null,"abstract":"As the HPC community moves into the exascale computing era, application energy is becoming as large of a concern as performance. Optimizing for energy will be essential in the effort to overcome the limited power envelope. Existing efforts to optimize energy in applications employ Dynamic Frequency and Voltage Scaling (DVFS) to maximize energy savings in less compute-intensive regions or non-critical execution paths. However, we found that DVFS has high power state switching overhead, preventing its use when a more fine-grained technique is necessary. In this work, we take advantage of the low transition overhead of CPU clock modulation and apply it to fine-grained Open MP parallel loops. The energy behavior of Open MP parallel regions is first characterized by changing the effective frequency using clock modulation. The clock modulation setting that achieves the best energy efficiency is then determined for each region. Finally, different CPU clock modulation settings are applied to the different loops within the same application. The resulting multi-frequency execution of Open MP applications achieves better energy-delay trade-off than any single frequency setting. In the best case scenario, the multi-frequency approach achieved 8.6% energy savings with less than 1.5% execution time increase. Concurrency throttling (i.e., Reducing the number of hardware threads used by an application) saves more energy and can be combined with CPU clock modulation. Using both, we see savings of 21% energy and improvement of energy-delay product (EDP) by 16%.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124568502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Energy-Efficient and Delay-Constrained Broadcast in Time-Varying Energy-Demand Graphs 时变能量需求图中的节能和延迟约束广播
2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.28
Chenxi Qiu, Haiying Shen, Lei Yu
{"title":"Energy-Efficient and Delay-Constrained Broadcast in Time-Varying Energy-Demand Graphs","authors":"Chenxi Qiu, Haiying Shen, Lei Yu","doi":"10.1109/ICPP.2015.28","DOIUrl":"https://doi.org/10.1109/ICPP.2015.28","url":null,"abstract":"In this paper, we study the minimum energy broadcast problem in time-varying graphs (TVGs), which are a very useful high level abstraction for studying highly dynamic wireless networks. To this end, we first incorporate a channel model, called energy-demand functions, to the current TVGs, namely time-varying energy-demand graphs (TVEGs). Based on this model, we formulate the problem: given a TVEG, what is the optimal schedule (i.e., Which nodes should forward a packet in what times and at what power levels) to minimize the energy consumption of the broadcast? We prove the problem to be NP-hard and o(log N) in approximable. It is a challenge to find a solution for this problem on continuous time. Fortunately, we prove that the problem on continuous time is equivalent to the problem on certain discrete time points, called discrete time set (DTS). Based on this property, we propose polynomial time solutions for this problem with different channel models, and evaluate the performance of these methods from real-life contact traces.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121248308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Testing Engine for High-Performance and Cost-Effective Workflow Execution in the Cloud 在云中用于高性能和经济高效的工作流执行的测试引擎
2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.94
V. Pallipuram, Trilce Estrada, M. Taufer
{"title":"A Testing Engine for High-Performance and Cost-Effective Workflow Execution in the Cloud","authors":"V. Pallipuram, Trilce Estrada, M. Taufer","doi":"10.1109/ICPP.2015.94","DOIUrl":"https://doi.org/10.1109/ICPP.2015.94","url":null,"abstract":"While pursuing high performance and cost effectiveness for directed acyclic graph (DAG)-structured scientific workflow executions in the cloud, it is critical to identify appropriate resource instances and their quantity. This paper presents a testing engine that employs a resource-selection heuristic, which statically analyzes the DAG structure to guide the selection of resource instances, how many and which ones. The testing engine combines the heuristic with two platform-independent DAG-scheduling policies, the Area-oriented DAG-scheduling heuristic (AO) and the Locally-Optimal heuristic (L-OPT), to perform extensive validation assessments. The testing engine ensures the realism of these assessments by modeling the performance variability of the cloud platform using real traces. The testing engine also enables cost-effectiveness analysis that guides users to select a small set of instance candidates that provide performance-cost trade off. Our empirical results show that the pairing of the resource-selection heuristic with AO scheduling policy is a powerful method for cost-effective DAG-structured workflow execution in the cloud.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115371644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Generating Efficient Tensor Contractions for GPUs 为gpu生成高效张量收缩
2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.106
T. Nelson, Axel Rivera, Prasanna Balaprakash, Mary W. Hall, P. Hovland, E. Jessup, B. Norris
{"title":"Generating Efficient Tensor Contractions for GPUs","authors":"T. Nelson, Axel Rivera, Prasanna Balaprakash, Mary W. Hall, P. Hovland, E. Jessup, B. Norris","doi":"10.1109/ICPP.2015.106","DOIUrl":"https://doi.org/10.1109/ICPP.2015.106","url":null,"abstract":"Many scientific and numerical applications, including quantum chemistry modeling and fluid dynamics simulation, require tensor product and tensor contraction evaluation. Tensor computations are characterized by arrays with numerous dimensions, inherent parallelism, moderate data reuse and many degrees of freedom in the order in which to perform the computation. The best-performing implementation is heavily dependent on the tensor dimensionality and the target architecture. In this paper, we map tensor computations to GPUs, starting with a high-level tensor input language and producing efficient CUDA code as output. Our approach is to combine tensor-specific mathematical transformations with a GPU decision algorithm, machine learning and auto tuning of a large parameter space. Generated code shows significant performance gains over sequential and Open MP parallel code, and a comparison with Open ACC shows the importance of auto tuning and other optimizations in our framework for achieving efficient results.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"35 13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131479796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Good Work Deserves Good Pay: A Quality-Based Surplus Sharing Method for Participatory Sensing 好的工作应该得到好的报酬:一种基于质量的参与式感知剩余分享方法
2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.47
Shuo Yang, Fan Wu, Shaojie Tang, Xiaofeng Gao, Bo Yang, Guihai Chen
{"title":"Good Work Deserves Good Pay: A Quality-Based Surplus Sharing Method for Participatory Sensing","authors":"Shuo Yang, Fan Wu, Shaojie Tang, Xiaofeng Gao, Bo Yang, Guihai Chen","doi":"10.1109/ICPP.2015.47","DOIUrl":"https://doi.org/10.1109/ICPP.2015.47","url":null,"abstract":"Participatory sensing has become a novel and promising paradigm in environmental data collection. However, the issue of data quality has not been carefully addressed. Low quality data contributions may undermine the effectiveness and prospects of participatory sensing, and thus motivates the need for approaches to guarantee the high quality of the contributed data. In this paper, we integrate quality estimation and monetary incentive, and propose a quality-based surplus sharing method for participatory sensing. Specifically, we design an unsupervised learning approach to quantify the users' data qualities and long-term reputations, and exploit an outlier detection technique to filter out anomalous data items. Furthermore, we model the process of surplus sharing as a cooperative game, and propose a Shapley value-based method to determine each user's payment. We have conducted a participatory sensing experiment, and the experiment results show that our approach achieves good performance in terms of both quality estimation and surplus sharing.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130451644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信