2015 44th International Conference on Parallel Processing最新文献_第3页

LMDD: Light-Weight Magnetic-Based Door Detection with Your Smartphone LMDD:智能手机上的轻型磁性门检测

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.101

Yiyang Zhao, Chen Qian, Liangyi Gong, Zhenhua Li, Yunhao Liu

引用次数: 14

Executing Large Scale Scientific Workflow Ensembles in Public Clouds 在公共云中执行大规模科学工作流集成

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.61

Qingye Jiang, Young Choon Lee, Albert Y. Zomaya

{"title":"Executing Large Scale Scientific Workflow Ensembles in Public Clouds","authors":"Qingye Jiang, Young Choon Lee, Albert Y. Zomaya","doi":"10.1109/ICPP.2015.61","DOIUrl":"https://doi.org/10.1109/ICPP.2015.61","url":null,"abstract":"Scientists in different fields, such as high energy physics, earth science, and astronomy are developing large-scale workflow applications. In many use cases, scientists need to run a set of interrelated but independent workflows (i.e., Workflow ensembles) for the entire scientific analysis. As a workflow ensemble usually contains many sub-workflows in each of which hundreds or thousands of jobs exist with precedence constraints, the execution of such a workflow ensemble makes a great concern with cost even using elastic and pay-as-you-go cloud resources. In this paper, we address two main challenges in executing large-scale workflow ensembles in public clouds with both cost and deadline constraints: (1) execution coordination, and (2) resource provisioning. To this end, we develop a new pulling based workflow execution system with a profiling-based resource provisioning strategy. The idea is homogeneity in both scientific workflows and cloud resources can be exploited to remove scheduling overhead (in execution coordination) and to minimize cost meeting deadline. Our results show that our solution system can achieve 80% speed-up, by removing scheduling overhead, compared to the well-known Pegasus workflow management system when running scientific workflow ensembles. Besides, our evaluation using Montage (an astronomical image mosaic engine) workflow ensembles on around 1000-core Amazon EC2 clusters has demonstrated the efficacy of our resource provisioning strategy in terms of cost effectiveness within deadline.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124427483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

An Energy-Efficient Branch Prediction with Grouped Global History 具有分组全局历史的节能分支预测

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.23

Mingkai Huang, Dan He, Xianhua Liu, Mingxing Tan, Xu Cheng

{"title":"An Energy-Efficient Branch Prediction with Grouped Global History","authors":"Mingkai Huang, Dan He, Xianhua Liu, Mingxing Tan, Xu Cheng","doi":"10.1109/ICPP.2015.23","DOIUrl":"https://doi.org/10.1109/ICPP.2015.23","url":null,"abstract":"Branch prediction has been playing an increasingly important role in improving the performance and energy efficiency for modern microprocessors. The state-of-the-art branch predictors, such as the perceptron and TAGE predictors, leverage novel prediction algorithms to explore longer branch history for higher prediction accuracy. We observe that as the branch history is becoming longer, the efficiency of global history is degraded by the interference of different branch instructions. In order to mitigate the excessive influence of the branch history interference, we propose the Grouped Global History (GGH) based branch predictor, a lightweight yet efficient branch predictor. Unlike existing branch predictors that make use of a unified global history for prediction, GGH divides the global history into a set of subgroups such that the interference resulted by frequently executed branch instructions could be restricted. With subgroups of global history, GGH also enables us to track even longer effective branch correlation without introducing hardware storage overhead. Our experimental results based on SPEC CINT 2006 workloads demonstrate that our approach can significantly reduce the branch mispredictions per kilo instructions (MPKI) by 4.76 over the baseline perceptron predictor, with a simple control logic extension.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132614767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

DPX10: An Efficient X10 Framework for Dynamic Programming Applications DPX10:动态规划应用程序的高效X10框架

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.96

Chen Wang, Ce Yu, Ji-zhou Sun, X. Meng

引用次数: 1

PPM: A Partitioned and Parallel Matrix Algorithm to Accelerate Encoding/Decoding Process of Asymmetric Parity Erasure Codes PPM:一种加速非对称奇偶校验纠删码编解码过程的分割并行矩阵算法

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.55

Shiyi Li, Q. Cao, Shenggang Wan, Wenhui Zhang, C. Xie, Xubin He, P. Subedi

引用次数: 2

Using Per-Loop CPU Clock Modulation for Energy Efficiency in OpenMP Applications 在OpenMP应用中使用单循环CPU时钟调制提高能源效率

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.72

Wei Wang, Allan Porterfield, John Cavazos, Sridutt Bhalachandra

{"title":"Using Per-Loop CPU Clock Modulation for Energy Efficiency in OpenMP Applications","authors":"Wei Wang, Allan Porterfield, John Cavazos, Sridutt Bhalachandra","doi":"10.1109/ICPP.2015.72","DOIUrl":"https://doi.org/10.1109/ICPP.2015.72","url":null,"abstract":"As the HPC community moves into the exascale computing era, application energy is becoming as large of a concern as performance. Optimizing for energy will be essential in the effort to overcome the limited power envelope. Existing efforts to optimize energy in applications employ Dynamic Frequency and Voltage Scaling (DVFS) to maximize energy savings in less compute-intensive regions or non-critical execution paths. However, we found that DVFS has high power state switching overhead, preventing its use when a more fine-grained technique is necessary. In this work, we take advantage of the low transition overhead of CPU clock modulation and apply it to fine-grained Open MP parallel loops. The energy behavior of Open MP parallel regions is first characterized by changing the effective frequency using clock modulation. The clock modulation setting that achieves the best energy efficiency is then determined for each region. Finally, different CPU clock modulation settings are applied to the different loops within the same application. The resulting multi-frequency execution of Open MP applications achieves better energy-delay trade-off than any single frequency setting. In the best case scenario, the multi-frequency approach achieved 8.6% energy savings with less than 1.5% execution time increase. Concurrency throttling (i.e., Reducing the number of hardware threads used by an application) saves more energy and can be combined with CPU clock modulation. Using both, we see savings of 21% energy and improvement of energy-delay product (EDP) by 16%.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124568502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Energy-Efficient and Delay-Constrained Broadcast in Time-Varying Energy-Demand Graphs 时变能量需求图中的节能和延迟约束广播

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.28

Chenxi Qiu, Haiying Shen, Lei Yu

引用次数: 1

A Testing Engine for High-Performance and Cost-Effective Workflow Execution in the Cloud 在云中用于高性能和经济高效的工作流执行的测试引擎

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.94

V. Pallipuram, Trilce Estrada, M. Taufer

{"title":"A Testing Engine for High-Performance and Cost-Effective Workflow Execution in the Cloud","authors":"V. Pallipuram, Trilce Estrada, M. Taufer","doi":"10.1109/ICPP.2015.94","DOIUrl":"https://doi.org/10.1109/ICPP.2015.94","url":null,"abstract":"While pursuing high performance and cost effectiveness for directed acyclic graph (DAG)-structured scientific workflow executions in the cloud, it is critical to identify appropriate resource instances and their quantity. This paper presents a testing engine that employs a resource-selection heuristic, which statically analyzes the DAG structure to guide the selection of resource instances, how many and which ones. The testing engine combines the heuristic with two platform-independent DAG-scheduling policies, the Area-oriented DAG-scheduling heuristic (AO) and the Locally-Optimal heuristic (L-OPT), to perform extensive validation assessments. The testing engine ensures the realism of these assessments by modeling the performance variability of the cloud platform using real traces. The testing engine also enables cost-effectiveness analysis that guides users to select a small set of instance candidates that provide performance-cost trade off. Our empirical results show that the pairing of the resource-selection heuristic with AO scheduling policy is a powerful method for cost-effective DAG-structured workflow execution in the cloud.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115371644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Generating Efficient Tensor Contractions for GPUs 为gpu生成高效张量收缩

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.106

T. Nelson, Axel Rivera, Prasanna Balaprakash, Mary W. Hall, P. Hovland, E. Jessup, B. Norris

引用次数: 37

Good Work Deserves Good Pay: A Quality-Based Surplus Sharing Method for Participatory Sensing 好的工作应该得到好的报酬:一种基于质量的参与式感知剩余分享方法

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.47

Shuo Yang, Fan Wu, Shaojie Tang, Xiaofeng Gao, Bo Yang, Guihai Chen

引用次数: 10