2011 IEEE International Parallel & Distributed Processing Symposium最新文献_第10页

The Impact of Soft Resource Allocation on n-Tier Application Scalability 软资源分配对n层应用可扩展性的影响

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI: 10.1109/IPDPS.2011.99

Qingyang Wang, Simon Malkowski, D. Jayasinghe, Pengcheng Xiong, C. Pu, Yasuhiko Kanemasa, Motoyuki Kawaba, L. Harada

{"title":"The Impact of Soft Resource Allocation on n-Tier Application Scalability","authors":"Qingyang Wang, Simon Malkowski, D. Jayasinghe, Pengcheng Xiong, C. Pu, Yasuhiko Kanemasa, Motoyuki Kawaba, L. Harada","doi":"10.1109/IPDPS.2011.99","DOIUrl":"https://doi.org/10.1109/IPDPS.2011.99","url":null,"abstract":"Good performance and efficiency, in terms of high quality of service and resource utilization for example, are important goals in a cloud environment. Through extensive measurements of an n-tier application benchmark (RUBBoS), we show that overall system performance is surprisingly sensitive to appropriate allocation of soft resources (e.g., server thread pool size). Inappropriate soft resource allocation can quickly degrade overall application performance significantly. Concretely, both under-allocation and over-allocation of thread pool can lead to bottlenecks in other resources because of non-trivial dependencies. We have observed some non-obvious phenomena due to these correlated bottlenecks. For instance, the number of threads in the Apache web server can limit the total useful throughput, causing the CPU utilization of the C-JDBC clustering middleware to decrease as the workload increases. We provide a practical iterative solution approach to this challenge through an algorithmic combination of operational queuing laws and measurement data. Our results show that soft resource allocation plays a central role in the performance scalability of complex systems such as n-tier applications in cloud environments.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117192572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55

Reducing Fragmentation on Torus-Connected Supercomputers 减少环形连接超级计算机上的碎片

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI: 10.1109/IPDPS.2011.82

Wei Tang, Z. Lan, N. Desai, Daniel Buettner, Yongen Yu

引用次数: 35

Enriching 3-D Video Games on Multicores 在多核上丰富3d视频游戏

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI: 10.1109/IPDPS.2011.26

Romain Cledat, T. Kumar, Jaswanth Sreeram, S. Pande

{"title":"Enriching 3-D Video Games on Multicores","authors":"Romain Cledat, T. Kumar, Jaswanth Sreeram, S. Pande","doi":"10.1109/IPDPS.2011.26","DOIUrl":"https://doi.org/10.1109/IPDPS.2011.26","url":null,"abstract":"The introduction of multicore processors on desktops and other personal computing platforms has given rise to multiple interesting end-user application possibilities. One important trend is the increased presence of resource hungry applications like gaming and multimedia applications. One of the key distinguishing factors of these applications is that they are amenable to variable semantics (ie, multiple possibilities of results) unlike traditional applications wherein a fixed, unique answer is expected. For example, varying degrees of image processing improves picture quality, different model complexities used in game physics allow different degrees of realism during game play, and so on. The goal of this paper is to demonstrate that scalable semantics in applications such as video games can be enriched with optional tasks that can be launched and thus adapt to the amount of available resources at runtime. We propose a C/C++ API that allows the programmer to define how the current semantics of a program can be opportunistically enriched, as well as the underlying runtime system that orchestrates the different computations We show how this infrastructure can be used to enrich a well known game called Quake 3. Our results show that it is possible to perform significant enrichment without degrading the application's performance by utilizing additional cores.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129474236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Characterization of System Services and Their Performance Impact in Multi-core Nodes 系统服务的特征及其对多核节点性能的影响

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI: 10.1109/IPDPS.2011.20

Seetharami R. Seelam, L. Fong, John Lewars, John Divirgilio, B. F. Veale, K. Gildea

引用次数: 9

Shared Resource Monitoring and Throughput Optimization in Cloud-Computing Datacenters 云计算数据中心共享资源监控与吞吐量优化

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI: 10.1109/IPDPS.2011.98

Jaideep Moses, R. Iyer, R. Illikkal, S. Srinivasan, K. Aisopos

{"title":"Shared Resource Monitoring and Throughput Optimization in Cloud-Computing Datacenters","authors":"Jaideep Moses, R. Iyer, R. Illikkal, S. Srinivasan, K. Aisopos","doi":"10.1109/IPDPS.2011.98","DOIUrl":"https://doi.org/10.1109/IPDPS.2011.98","url":null,"abstract":"Many data centers employ server consolidation to maximize the efficiency of platform resource usage. As a result, multiple virtual machines (VMs) simultaneously run on each data center platform. Contention for shared resources between these virtual machines has an undesirable and non-deterministic impact on their performance behavior in such platforms. This paper proposes the use of shared resource monitoring to (a) understand the resource usage of each virtual machine on each platform, (b) collect resource usage and performance across different platforms to correlate implications of usage to performance, and (c) migrate VMs that are resource-constrained to improve overall data center throughput and improve Quality of Service (QoS). We focus our efforts on monitoring and addressing shared cache contention and propose a new optimization metric that captures the priority of the VM and the overall weighted throughput of the data center. We conduct detailed experiments emulating data center scenarios including on-line transaction processing workloads (based on TPC-C) middle-tier workloads (based on SPECjbb and SPECjAppServer) and financial workloads (based on PARSEC). We show that monitoring shared resource contention (such as shared cache) is highly beneficial to better manage throughput and QoS in a cloud-computing data center environment.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124187591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

A Scalable Reverse Lookup Scheme Using Group-Based Shifted Declustering Layout 一种基于分组的移位聚类布局的可扩展反向查找方案

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI: 10.1109/IPDPS.2011.64

Junyao Zhang, Pengju Shang, Jun Wang

{"title":"A Scalable Reverse Lookup Scheme Using Group-Based Shifted Declustering Layout","authors":"Junyao Zhang, Pengju Shang, Jun Wang","doi":"10.1109/IPDPS.2011.64","DOIUrl":"https://doi.org/10.1109/IPDPS.2011.64","url":null,"abstract":"Recent years have witnessed an increasing demand for super data clusters. The super data clusters have reached the petabyte-scale that can consist of thousands or tens of thousands storage nodes at a single site. For this architecture, reliability is becoming a great concern. In order to achieve a high reliability, data recovery and node reconstruction is a must. Although extensive research works have investigated how to sustain high performance and high reliability in case of node failures at large scale, a reverse lookup problem, namely finding the objects list for the failed node remains open. This is especially true for storage systems with high requirement of data integrity and availability, such as scientific research data clusters and etc. Existing solutions are either time consuming or expensive. Meanwhile, replication based block placement can be used to realize fast reverse lookup. However, they are designed for centralized, small-scale storage architectures. In this paper, we propose a fast and efficient reverse lookup scheme named Group-based Shifted Declustering (G-SD) layout that is able to locate the whole content of the failed node. G-SD extends our previous shifted declustering layout and applies to large-scale file systems. Our mathematical proofs and real-life experiments show that G-SD is a scalable reverse lookup scheme that is up to one order of magnitude faster than existing schemes.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123771585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services CABdedupe:基于因果关系的重复数据删除性能助推器

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI: 10.1109/IPDPS.2011.76

Yujuan Tan, Hong Jiang, D. Feng, Lei Tian, Zhichao Yan

{"title":"CABdedupe: A Causality-Based Deduplication Performance Booster for Cloud Backup Services","authors":"Yujuan Tan, Hong Jiang, D. Feng, Lei Tian, Zhichao Yan","doi":"10.1109/IPDPS.2011.76","DOIUrl":"https://doi.org/10.1109/IPDPS.2011.76","url":null,"abstract":"Due to the relatively low bandwidth of WAN (Wide Area Network) that supports cloud backup services, both the backup time and restore time in the cloud backup environment are in desperate need for reduction to make cloud backup a practical and affordable service for small businesses and telecommuters alike. Existing solutions that employ the deduplication technology for cloud backup services only focus on removing redundant data from transmission during backup operations to reduce the backup time, while paying little attention to the restore time that we argue is an important aspect and affects the overall quality of service of the cloud backup services. In this paper, we propose a emph{CAusality-Based deduplication performance booster for both cloud backup and restore operations}, called CABdedupe, which captures the causal relationship among chronological versions of datasets that are processed in multiple backups/restores, to remove the unmodified data from transmission during not only backup operations but also restore operations, thus to improve both the backup and restore performances. CABdedupe is a middleware that is orthogonal to and can be integrated into any existing backup system. Our extensive experiments, where we integrate CABdedupe into two existing backup systems and feed real world datasets, show that both the backup time and restore time are significantly reduced, with a reduction ratio of up to $103:1$.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132865900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 64

Automatic Recognition of Performance Idioms in Scientific Applications 科学应用中表演习语的自动识别

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI: 10.1109/IPDPS.2011.21

J. He, A. Snavely, R. V. D. Wijngaart, M. Frumkin

{"title":"Automatic Recognition of Performance Idioms in Scientific Applications","authors":"J. He, A. Snavely, R. V. D. Wijngaart, M. Frumkin","doi":"10.1109/IPDPS.2011.21","DOIUrl":"https://doi.org/10.1109/IPDPS.2011.21","url":null,"abstract":"Basic data flow patterns that we call textbf{performance idioms}, such as stream, transpose, reduction, random access and stencil, are common in scientific numerical applications. We hypothesize that a small number of idioms can cover most programming constructs that dominate the execution time of scientific codes and can be used to approximate the application performance. To check these hypotheses, we proposed an automatic idioms recognition method and implemented the method, based on the open source compiler Open64. With the NAS Parallel Benchmark (NPB) as a case study, the prototype system is about $90%$ accurate compared with idiom classification by a human expert. Our results showed that the above five idioms suffice to cover $100%$ of the six NPB codes (MG, CG, FT, BT, SP and LU). We also compared the performance of our idiom benchmarks with their corresponding instances in the NPB codes on two different platforms with different methods. The approximation accuracy is up to $96.6%$. The contribution is to show that a small set of idioms can cover more complex codes, that idioms can be recognized automatically, and that suitably defined idioms may approximate application performance.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129113930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs 调和采样和直接仪器的非侵入式调用路径分析的MPI程序

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI: 10.1109/IPDPS.2011.67

Z. Szebenyi, T. Gamblin, M. Schulz, B. Supinski, F. Wolf, B. Wylie

{"title":"Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs","authors":"Z. Szebenyi, T. Gamblin, M. Schulz, B. Supinski, F. Wolf, B. Wylie","doi":"10.1109/IPDPS.2011.67","DOIUrl":"https://doi.org/10.1109/IPDPS.2011.67","url":null,"abstract":"We can profile the performance behavior of parallel programs at the level of individual call paths through sampling or direct instrumentation. While we can easily control measurement dilation by adjusting the sampling frequency, the statistical nature of sampling and the difficulty of accessing the parameters of sampled events make it unsuitable for obtaining certain communication metrics, such as the size of message payloads. Alternatively, direct instrumentation, which is preferable for capturing message-passing events, can excessively dilate measurements, particularly for C++ programs, which often have many short but frequently called class member functions. Thus, we combine these techniques in a unified framework that exploits the strengths of each approach while avoiding their weaknesses: We use direct instrumentation to intercept MPI routines while we record the execution of the remaining code through low-overhead sampling. One of the main technical hurdles mastered was the inexpensive and portable determination of call-path information during the invocation of MPI routines. We show that the overhead of our implementation is sufficiently low to support substantial performance improvement of a C++ fluid-dynamics code.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132511559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Design of MILC Lattice QCD Application for GPU Clusters GPU集群中MILC点阵QCD应用的设计

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI: 10.1109/IPDPS.2011.43

Guochun Shi, S. Gottlieb, A. Torok, V. Kindratenko

引用次数: 14