IEEE International Symposium on High-Performance Parallel Distributed Computing最新文献

筛选
英文 中文
Coupling scheduler for MapReduce/Hadoop MapReduce/Hadoop的耦合调度程序
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287097
Jian Tan, Xiaoqiao Meng, Li Zhang
{"title":"Coupling scheduler for MapReduce/Hadoop","authors":"Jian Tan, Xiaoqiao Meng, Li Zhang","doi":"10.1145/2287076.2287097","DOIUrl":"https://doi.org/10.1145/2287076.2287097","url":null,"abstract":"Current schedulers of MapReduce/Hadoop are quite successful in providing good performance. However improving spaces still exist: map and reduce tasks are not jointly optimized for scheduling, albeit there is a strong dependence between them. This can cause job starvation and bad data locality. We design a resource-aware scheduler for Hadoop, which couples the progresses of mappers and reducers, and jointly optimize the placements for both of them. This mitigates the starvation problem and improves the overall data locality. Our experiments demonstrate improvements to job response times by up to an order of magnitude.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128762711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
QBox: guaranteeing I/O performance on black box storage systems QBox:保证黑匣子存储系统的I/O性能
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287087
Dimitris Skourtis, S. Kato, S. Brandt
{"title":"QBox: guaranteeing I/O performance on black box storage systems","authors":"Dimitris Skourtis, S. Kato, S. Brandt","doi":"10.1145/2287076.2287087","DOIUrl":"https://doi.org/10.1145/2287076.2287087","url":null,"abstract":"Many storage systems are shared by multiple clients with different types of workloads and performance targets. To achieve performance targets without over-provisioning, a system must provide isolation between clients. Throughput-based reservations are challenging due to the mix of workloads and the stateful nature of disk drives, leading to low reservable throughput, while existing utilization-based solutions require specialized I/O scheduling for each device in the storage system.\u0000 Qbox is a new utilization-based approach for generic black box storage systems that enforces utilization (and, indirectly, throughput) requirements and provides isolation between clients, without specializedlow-level I/O scheduling. Our experimental results show that Qbox provides good isolation and achieves the target utilizations of its clients.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131299637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Work stealing and persistence-based load balancers for iterative overdecomposed applications 用于迭代过度分解应用程序的工作窃取和基于持久性的负载平衡器
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287103
J. Lifflander, S. Krishnamoorthy, L. Kalé
{"title":"Work stealing and persistence-based load balancers for iterative overdecomposed applications","authors":"J. Lifflander, S. Krishnamoorthy, L. Kalé","doi":"10.1145/2287076.2287103","DOIUrl":"https://doi.org/10.1145/2287076.2287103","url":null,"abstract":"Applications often involve iterative execution of identical or slowly evolving calculations. Such applications require incremental rebalancing to improve load balance across iterations. In this paper, we consider the design and evaluation of two distinct approaches to addressing this challenge: persistence-based load balancing and work stealing. The work to be performed is overdecomposed into tasks, enabling automatic rebalancing by the middleware. We present a hierarchical persistence-based rebalancing algorithm that performs localized incremental rebalancing. We also present an active-message-based retentive work stealing algorithm optimized for iterative applications on distributed memory machines. We demonstrate low overheads and high efficiencies on the full NERSC Hopper (146,400 cores) and ALCF Intrepid systems (163,840 cores), and on up to 128,000 cores on OLCF Titan.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128912631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
A system-aware optimized data organization for efficient scientific analytics 一个系统感知的优化数据组织,用于高效的科学分析
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287095
Yuan Tian, S. Klasky, Weikuan Yu, H. Abbasi, Bin Wang, N. Podhorszki, R. Grout, M. Wolf
{"title":"A system-aware optimized data organization for efficient scientific analytics","authors":"Yuan Tian, S. Klasky, Weikuan Yu, H. Abbasi, Bin Wang, N. Podhorszki, R. Grout, M. Wolf","doi":"10.1145/2287076.2287095","DOIUrl":"https://doi.org/10.1145/2287076.2287095","url":null,"abstract":"Large-scale scientific applications on High End Computing systems produce a large volume of highly complex datasets. Such data imposes a grand challenge to conventional storage systems for the need of efficient I/O solutions during both the simulation runtime and data post-processing phases. With the mounting needs of scientific discovery, the read performance of large-scale simulations has becomes a critical issue for the HPC community. In this study, we propose a system-aware optimized data organization strategy that can organize data blocks of multidimensional scientific data efficiently based on simulation output and the underlying storage systems, thereby enabling efficient scientific analytics. Our experimental results demonstrate a performance speedup up to 72 times for the combustion simulation S3D, compared to the logically contiguous data layout.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"53 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124857063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Enabling event tracing at leadership-class scale through I/O forwarding middleware 通过I/O转发中间件实现领导级规模的事件跟踪
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287085
T. Ilsche, Joseph Schuchart, Jason Cope, D. Kimpe, T. Jones, A. Knüpfer, K. Iskra, R. Ross, W. Nagel, S. Poole
{"title":"Enabling event tracing at leadership-class scale through I/O forwarding middleware","authors":"T. Ilsche, Joseph Schuchart, Jason Cope, D. Kimpe, T. Jones, A. Knüpfer, K. Iskra, R. Ross, W. Nagel, S. Poole","doi":"10.1145/2287076.2287085","DOIUrl":"https://doi.org/10.1145/2287076.2287085","url":null,"abstract":"Event tracing is an important tool for understanding the performance of parallel applications. As concurrency increases in leadership-class computing systems, the quantity of performance log data can overload the parallel file system, perturbing the application being observed. In this work we present a solution for event tracing at leadership scales. We enhance the I/O forwarding system software to aggregate and reorganize log data prior to writing to the storage system, significantly reducing the burden on the underlying file system for this type of traffic. Furthermore, we augment the I/O forwarding system with a write buffering capability to limit the impact of artificial perturbations from log data accesses on traced applications. To validate the approach, we modify the Vampir tracing toolset to take advantage of this new capability and show that the approach increases the maximum traced application size by a factor of 5x to more than 200,000 processes.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116066187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Understanding the effects and implications of compute node related failures in hadoop 了解hadoop中计算节点相关故障的影响和含义
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287108
Florin Dinu, T. Ng
{"title":"Understanding the effects and implications of compute node related failures in hadoop","authors":"Florin Dinu, T. Ng","doi":"10.1145/2287076.2287108","DOIUrl":"https://doi.org/10.1145/2287076.2287108","url":null,"abstract":"Hadoop has become a critical component in today's cloud environment. Ensuring good performance for Hadoop is paramount for the wide-range of applications built on top of it. In this paper we analyze Hadoop's behavior under failures involving compute nodes. We find that even a single failure can result in inflated, variable and unpredictable job running times, all undesirable properties in a distributed system. We systematically track the causes underlying this distressing behavior. First, we find that Hadoop makes unrealistic assumptions about task progress rates. These assumptions can be easily invalidated by the cloud environment and, more surprisingly, by Hadoop's own design decisions. The result are significant inefficiencies in Hadoop's speculative execution algorithm. Second, failures are re-discovered individually by each task at the cost of great degradation in job running time. The reason is that Hadoop focuses on extreme scalability and thus trades off possible improvements resulting from sharing failure information between tasks. Third, Hadoop does not consider the causes of connection failures between its tasks. We show that the resulting overloading of connection failure semantics unnecessarily causes an otherwise localized failure to propagate to healthy tasks. We also discuss the implications of our findings and draw attention to new ways of improving Hadoop-like frameworks.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"23 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121004220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 81
Putting a "big-data" platform to good use: training kinect 善用“大数据”平台:训练kinect
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287078
M. Budiu
{"title":"Putting a \"big-data\" platform to good use: training kinect","authors":"M. Budiu","doi":"10.1145/2287076.2287078","DOIUrl":"https://doi.org/10.1145/2287076.2287078","url":null,"abstract":"In the last 7 years at Microsoft Research in Silicon Valley we have constructed the DryadLINQ software stack for large-scale data-parallel cluster computations. The architecture of the ensemble is depicted in Figure 1. The goal of the DryadLINQ project is to make writing parallel programs manipulating large amounts of data (terabytes to petabytes) as easy as programming a single machine. DryadLINQ is a batch computation model, optimized for throughput; since it is targets large clusters of commodity computers faulttolerance is a primary concern. A primary tenet is that moving computation close to the data is much cheaper than moving the data itself. Here we discuss briefly the current architecture of the system (but more research is ongoing). Our software runs on relatively inexpensive computer clusters, using unmodified Windows Server. Our software makes minimal assumptions about the underlying cluster, and has","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116876787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dynamic adaptive virtual core mapping to improve power, energy, and performance in multi-socket multicores 动态自适应虚拟核映射,以提高多插槽多核的功率、能量和性能
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287114
C. Bae, Lei Xia, P. Dinda, J. Lange
{"title":"Dynamic adaptive virtual core mapping to improve power, energy, and performance in multi-socket multicores","authors":"C. Bae, Lei Xia, P. Dinda, J. Lange","doi":"10.1145/2287076.2287114","DOIUrl":"https://doi.org/10.1145/2287076.2287114","url":null,"abstract":"Consider a multithreaded parallel application running inside a multicore virtual machine context that is itself hosted on a multi-socket multicore physical machine. How should the VMM map virtual cores to physical cores? We compare a local mapping, which compacts virtual cores to processor sockets, and an interleaved mapping, which spreads them over the sockets. Simply choosing between these two mappings exposes clear tradeoffs between performance, energy, and power. We then describe the design, implementation, and evaluation of a system that automatically and dynamically chooses between the two mappings. The system consists of a set of efficient online VMM-based mechanisms and policies that (a) capture the relevant characteristics of memory reference behavior, (b) provide a policy and mechanism for configuring the mapping of virtual machine cores to physical cores that optimizes for power, energy, or performance, and (c) drive dynamic migrations of virtual cores among local physical cores based on the workload and the currently specified objective. Using these techniques we demonstrate that the performance of SPEC and PARSEC benchmarks can be increased by as much as 66%, energy reduced by as much as 31%, and power reduced by as much as 17%, depending on the optimization objective.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123125082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Exploring the performance and mapping of HPC applications to platforms in the cloud 探索高性能计算应用程序到云平台的性能和映射
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287093
Abhishek K. Gupta, L. Kalé, D. Milojicic, P. Faraboschi, R. Kaufmann, Verdi March, F. Gioachin, Chun Hui Suen, Bu-Sung Lee
{"title":"Exploring the performance and mapping of HPC applications to platforms in the cloud","authors":"Abhishek K. Gupta, L. Kalé, D. Milojicic, P. Faraboschi, R. Kaufmann, Verdi March, F. Gioachin, Chun Hui Suen, Bu-Sung Lee","doi":"10.1145/2287076.2287093","DOIUrl":"https://doi.org/10.1145/2287076.2287093","url":null,"abstract":"This paper presents a scheme to optimize the mapping of HPC applications to a set of hybrid dedicated and cloud resources. First, we characterize application performance on dedicated clusters and cloud to obtain application signatures. Then, we propose an algorithm to match these signatures to resources such that performance is maximized and cost is minimized. Finally, we show simulation results revealing that in a concrete scenario our proposed scheme reduces the cost by 60% at only 10-15% performance penalty vs. a non optimized configuration. We also find that the execution overhead in cloud can be minimized to a negligible level using thin hypervisors or OS-level containers.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130968082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
vSlicer: latency-aware virtual machine scheduling via differentiated-frequency CPU slicing vSlicer:通过差频CPU切片实现延迟感知的虚拟机调度
IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI: 10.1145/2287076.2287080
Cong Xu, S. Gamage, P. N. Rao, Ardalan Kangarlou, R. Kompella, Dongyan Xu
{"title":"vSlicer: latency-aware virtual machine scheduling via differentiated-frequency CPU slicing","authors":"Cong Xu, S. Gamage, P. N. Rao, Ardalan Kangarlou, R. Kompella, Dongyan Xu","doi":"10.1145/2287076.2287080","DOIUrl":"https://doi.org/10.1145/2287076.2287080","url":null,"abstract":"Recent advances in virtualization technologies have made it feasible to host multiple virtual machines (VMs) in the same physical host and even the same CPU core, with fair share of the physical resources among the VMs. However, as more VMs share the same core/CPU, the CPU access latency experienced by each VM increases substantially, which translates into longer I/O processing latency perceived by I/O-bound applications. To mitigate such impact while retaining the benefit of CPU sharing, we introduce a new class of VMs called latency-sensitive VMs (LSVMs), which achieve better performance for I/O-bound applications while maintaining the same resource share (and thus cost) as other CPU-sharing VMs. LSVMs are enabled by vSlicer, a hypervisor-level technique that schedules each LSVM more frequently but with a smaller micro time slice. vSlicer enables more timely processing of I/O events by LSVMs, without violating the CPU share fairness among all sharing VMs. Our evaluation of a vSlicer prototype in Xen shows that vSlicer substantially reduces network packet round-trip times and jitter and improves application-level performance. For example, vSlicer doubles both the connection rate and request processing throughput of an Apache web server; reduces a VoIP server's upstream jitter by 62%; and shortens the execution times of Intel MPI benchmark programs by half or more.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121715295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 104
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信