2010 39th International Conference on Parallel Processing最新文献

筛选
英文 中文
Optimal Task Reallocation in Heterogeneous Distributed Computing Systems with Age-Dependent Delay Statistics 具有年龄相关延迟统计的异构分布式计算系统的最优任务再分配
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.20
J. Pezoa, M. Hayat, Zhuoyao Wang, S. Dhakal
{"title":"Optimal Task Reallocation in Heterogeneous Distributed Computing Systems with Age-Dependent Delay Statistics","authors":"J. Pezoa, M. Hayat, Zhuoyao Wang, S. Dhakal","doi":"10.1109/ICPP.2010.20","DOIUrl":"https://doi.org/10.1109/ICPP.2010.20","url":null,"abstract":"This paper presents a general framework for optimal task reallocation in heterogeneous distributed-computing systems and offers a rigorous analytical model for the stochastic execution time of a workload. The model takes into account the heterogeneity and stochastic nature of the tasks' service and transfer times, servers' failure times, as well as an arbitrary task-reallocation policy. The stochastic service, transfer and failure times are assumed to have general, age-dependent (non-exponential) distributions, resulting in a tandem distributed queuing system with non-Markovian dynamics. Auxiliary age variables are introduced in the analysis to capture the memory associated with the non-Markovian stochastic times, thereby enabling a regenerative age-dependent analytical characterization of the statistics of the execution time of a workload. The model is utilized to devise task reallocation policies that optimize three metrics: the average execution time of a workload, the quality-of-service in executing a workload by a prescribed deadline and the reliability in executing a workload. Implications of the non-exponential event times on these metrics are also studied. Key results are verified experimentally on a distributed-computing testbed.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126183298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Detailed Load Balance Analysis of Large Scale Parallel Applications 大规模并行应用的详细负载平衡分析
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.61
K. Huck, J. Labarta
{"title":"Detailed Load Balance Analysis of Large Scale Parallel Applications","authors":"K. Huck, J. Labarta","doi":"10.1109/ICPP.2010.61","DOIUrl":"https://doi.org/10.1109/ICPP.2010.61","url":null,"abstract":"Balancing the workload in parallel applications is a difficult task, even in conventional cases. Many computing cycles are wasted when the load is not evenly balanced across processing nodes. Global load balance analysis may determine that an application is well balanced, when in fact the application has hidden inefficiencies. In this paper, we consider the load balance of parallel applications which present unique challenges in the analysis process. We have performed trace analysis and simulation to demonstrate the existence of otherwise undiscovered performance issues. We also demonstrate that by collecting dynamic phase profiles, we are able to approximate the analysis results of trace analysis and simulation, and more accurately represent the performance behavior of complex parallel applications than through flat or callpath profiles alone.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125923885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A MapReduce Style Framework for Computations on Trees 树上计算的MapReduce风格框架
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.42
William Sarje, S. Aluru
{"title":"A MapReduce Style Framework for Computations on Trees","authors":"William Sarje, S. Aluru","doi":"10.1109/ICPP.2010.42","DOIUrl":"https://doi.org/10.1109/ICPP.2010.42","url":null,"abstract":"The emergence of cloud computing and Google's MapReduce paradigm is renewing interest in the development of broadly applicable high level abstractions as a means to deliver easy programmability and cyber resources to the user, while hiding complexities of system architecture, parallelism and algorithms, heterogeneity, and fault-tolerance. In this paper, we present a high-level framework for computations on tree structures. Despite the diversity and types of tree structures, and the algorithmic ways in which they are utilized, our abstraction provides sufficient generality to be broadly applicable. We show how certain frequently used operations on tree structures can be cast in terms of our framework. We further demonstrate the applicability of our framework by solving two applications -- k-nearest neighbors and fast multipole method (FMM) based simulations -- by merely using our framework in multiple ways. We developed a generic programming based implementation of the framework using C++ and MPI, and demonstrate its performance on the aforementioned applications using homogeneous multi-core clusters.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127052403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Automatic Generation of Stream Descriptors for Streaming Architectures 流架构中流描述符的自动生成
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.38
L. Gao, David Zaretsky, Gaurav Mittal, D. Schonfeld, P. Banerjee
{"title":"Automatic Generation of Stream Descriptors for Streaming Architectures","authors":"L. Gao, David Zaretsky, Gaurav Mittal, D. Schonfeld, P. Banerjee","doi":"10.1109/ICPP.2010.38","DOIUrl":"https://doi.org/10.1109/ICPP.2010.38","url":null,"abstract":"We describe a novel approach for automatically generating streaming architectures from software programs. While existing systems require user-defined stream models, our method automatically identifies producer-consumer streaming relationships and translates them into streaming architectures. Data streams between producer-consumer kernels are represented using a combination of stream descriptors and CFGs, which are categorized into four stream types. A bridge module is generated based on the stream type in the streaming architecture to facilitate data streaming between each producer-consumer pair. Several optimizations are also developed to improve throughput and parallelism. We demonstrate our results on a FPGA based platform. The automatically generated streaming architectures show 1.5-3x speedups over the non-streaming designs by employing spatial and temporal data independence to increase parallelism.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130458870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Incentive Compatible Online Scheduling of Malleable Parallel Jobs with Individual Deadlines 具有个人截止日期的可塑并行作业的激励兼容在线调度
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.60
T. E. Carroll, Daniel Grosu
{"title":"Incentive Compatible Online Scheduling of Malleable Parallel Jobs with Individual Deadlines","authors":"T. E. Carroll, Daniel Grosu","doi":"10.1109/ICPP.2010.60","DOIUrl":"https://doi.org/10.1109/ICPP.2010.60","url":null,"abstract":"We consider the online scheduling of malleable jobs on parallel systems, such as clusters, symmetric multiprocessing computers, and multi-core processor computers. Malleable jobs is a model of parallel processing in which jobs adapt to the number of processors assigned to them. This model permits the scheduler and resource manager to make more efficient use of the available resources. Each malleable job is characterized by arrival time, deadline, and value. If the job completes by its deadline, the user earns the payoff indicated by the value; otherwise, she earns a payoff of zero. The scheduling objective is to maximize the sum of the values of the jobs that complete by their associated deadlines. Complicating the matter is that users in the real world are rational and they will attempt to manipulate the scheduler by misreporting their jobs' parameters if it benefits them to do so. To mitigate this behavior, we design an incentive compatible online scheduling mechanism. Incentive compatibility assures us that the users will obtain the maximum payoff only if they truthfully report their jobs' parameters to the scheduler. Finally, we simulate and study the mechanism to show the effects of misreports on the cheaters and on the system.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"192 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133783239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Optimizing HPC Fault-Tolerant Environment: An Analytical Approach 优化HPC容错环境:一种分析方法
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.80
Hui Jin, Yong Chen, Huaiyu Zhu, Xian-He Sun
{"title":"Optimizing HPC Fault-Tolerant Environment: An Analytical Approach","authors":"Hui Jin, Yong Chen, Huaiyu Zhu, Xian-He Sun","doi":"10.1109/ICPP.2010.80","DOIUrl":"https://doi.org/10.1109/ICPP.2010.80","url":null,"abstract":"The increasingly large ensemble size of modern High-Performance Computing (HPC) systems has drastically increased the possibility of failures. Performance under failures and its optimization become timely important issues facing the HPC community. In this study, we propose an analytical model to predict the application performance. The model characterizes the impact of coordinated checkpointing and system failures on application performance, considering all the factors including workload, the number of nodes, failure arrival rate, recovery cost, and checkpointing interval and overhead. Based on the model, we gauge three parameters, the number of compute nodes, checkpointing interval, and the number of spare nodes to conduct a comprehensive study of performance optimization under failures. Performance scalability under failures is also studied to explore the performance improvement space for different parameters. Experimental results from both synthetic and actual system failure logs confirm that the proposed model and optimization methodologies are effective and feasible.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133207864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Subgraph Enumeration in Large Social Contact Networks Using Parallel Color Coding and Streaming 基于并行颜色编码和流的大型社交网络子图枚举
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.67
Zhao Zhao, Maleq Khan, V. S. A. Kumar, M. Marathe
{"title":"Subgraph Enumeration in Large Social Contact Networks Using Parallel Color Coding and Streaming","authors":"Zhao Zhao, Maleq Khan, V. S. A. Kumar, M. Marathe","doi":"10.1109/ICPP.2010.67","DOIUrl":"https://doi.org/10.1109/ICPP.2010.67","url":null,"abstract":"Identifying motifs (or commonly occurring subgraphs/templates) has been found to be useful in a number of applications, such as biological and social networks; they have been used to identify building blocks and functional properties, as well as to characterize the underlying networks. Enumerating subgraphs is a challenging computational problem, and all prior results have considered networks with a few thousand nodes. In this paper, we develop a parallel subgraph enumeration algorithm, ParSE, that scales to networks with millions of nodes. Our algorithm is a randomized approximation scheme, that estimates the subgraph frequency to any desired level of accuracy, and allows enumeration of a class of motifs that extends those considered in prior work. Our approach is based on parallelization of an approach called color coding, combined with a stream based partitioning. We also show that ParSE scales well with the number of processors, over a large range.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"15 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120858168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Task Assignment with Cache Partitioning and Locking for WCET Minimization on MPSoC 任务分配与缓存分区和锁定在MPSoC上的WCET最小化
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.65
Tiantian Liu, Yingchao Zhao, Minming Li, C. Xue
{"title":"Task Assignment with Cache Partitioning and Locking for WCET Minimization on MPSoC","authors":"Tiantian Liu, Yingchao Zhao, Minming Li, C. Xue","doi":"10.1109/ICPP.2010.65","DOIUrl":"https://doi.org/10.1109/ICPP.2010.65","url":null,"abstract":"Cache is known for its unpredictability in embedded systems. Cache locking technique is often utilized to guarantee a tighter prediction of Worst-Case Execution Time (WCET) which is one of the most important performance metrics for embedded systems. However, in Multi-Processor Systems-on-Chip (MPSoC) systems with multi-tasks, Level 2 (L2) cache is often shared among different tasks and cores, which leads to higher complexity in the cache management and extended unpredictability of cache. Task assignment has inherent relevancy for cache behavior, while cache behavior also affects the efficiency of task assignment. Task assignment and cache behavior have dramatic influences on the overall WCET of MPSoC. In this paper, overall WCET represents the worst-case finishing time of a set of tasks running on different cores. This paper proposes joint task assignment and cache partitioning techniques to minimize the overall WCET for MPSoC systems. Cache locking is applied to each task to guarantee a precise WCET, which in return facilitates task assignment and cache partitioning. We prove that the joint problem is NP-Hard and propose several efficient algorithms. Experimental results show that the proposed algorithms can consistently reduce the overall WCET compared to previous techniques.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117087900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Checkpointing vs. Migration for Post-Petascale Supercomputers 后千兆级超级计算机的检查点与迁移
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.26
F. Cappello, H. Casanova, Y. Robert
{"title":"Checkpointing vs. Migration for Post-Petascale Supercomputers","authors":"F. Cappello, H. Casanova, Y. Robert","doi":"10.1109/ICPP.2010.26","DOIUrl":"https://doi.org/10.1109/ICPP.2010.26","url":null,"abstract":"An alternative to classical fault-tolerant approaches for large-scale clusters is failure avoidance, by which the occurrence of a fault is predicted and a preventive measure is taken. We develop analytical performance models for two types of preventive measures: preventive checkpointing and preventive migration. We also develop an analytical model of the performance of a standard periodic checkpoint fault-tolerant approach. We instantiate these models for platform scenarios representative of current and future technology trends. We find that preventive migration is the better approach in the short term by orders of magnitude. However, in the longer term, both approaches have comparable merit with a marginal advantage for preventive checkpointing. We also find that standard non-prediction-based fault tolerance achieves poor scaling when compared to prediction-based failure avoidance, thereby demonstrating the importance of failure prediction capabilities. Finally, our results show that achieving good utilization in truly large-scale machines (e.g., 2^{20} nodes) for parallel workloads will require more than the failure avoidance techniques evaluated in this work.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114866304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
A Quantitative Study of Accountability in Wireless Multi-hop Networks 无线多跳网络中责任的定量研究
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.29
Zhifeng Xiao, Yang Xiao, Jie Wu
{"title":"A Quantitative Study of Accountability in Wireless Multi-hop Networks","authors":"Zhifeng Xiao, Yang Xiao, Jie Wu","doi":"10.1109/ICPP.2010.29","DOIUrl":"https://doi.org/10.1109/ICPP.2010.29","url":null,"abstract":"In this paper, we explore a quantitative approach to accountable wireless multi-hop networks. We propose using hierarchical P-Accountability to adapt the requirements of modeling a complex network environment and assess the degree of accountability in a fine-grained manner. We have defined P-Accountability and demonstrated its use in the hierarchical network environment. In addition, we apply P-Accountability to a wireless multi-hop network system. Both numerical and simulation results show that our approach is applicable to most accountable systems and that it provides a flexible and comprehensive view of the degree of accountability.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129892996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信