2011 International Conference on Parallel Processing最新文献_第8页

QoS Preference-Aware Replica Selection Strategy Using MapReduce-Based PGA in Data Grids 数据网格中基于mapreduce的PGA的QoS偏好感知副本选择策略

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.19

Runqun Xiong, Junzhou Luo, Aibo Song, Bo Liu, Fang Dong

{"title":"QoS Preference-Aware Replica Selection Strategy Using MapReduce-Based PGA in Data Grids","authors":"Runqun Xiong, Junzhou Luo, Aibo Song, Bo Liu, Fang Dong","doi":"10.1109/ICPP.2011.19","DOIUrl":"https://doi.org/10.1109/ICPP.2011.19","url":null,"abstract":"Data replication is an important technique to reduce access latency and bandwidth consumption in Grid environment. As one of the major functions of data replication, replica selection determines the best replica according to some specific criteria in Data Grid environment, where the data resources are limited and Grid users compete for these resources. In this paper, we focus mainly on a novel QoS preference-aware replica selection strategy which will meet individual QoS sensitivity (IQS) constraints for different users/applications. We first present a framework that characterize QoS properties of replica services and establish its mathematical model by introducing quantification methods. In order to deal with the IQS constraints and to perceive Grid users' QoS preferences accurately, we propose a QoS preference acquisition algorithm based on Analytic Hierarchy Process (AHP). We then design and implement a novel effective and efficient parallel genetic algorithm (PGA) based on Map Reduce paradigm for optimizing the objective function which corresponds to the optimal replica. Simulation results show that our strategy has a better performance in validity as well as scalability, and the optimal replica can always be obtained for Grid users with different IQS constraints under Data Grid environments that vary in system loads, scheduling strategies and user types.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127010008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

aMOSS: Automated Multi-objective Server Provisioning with Stress-Strain Curving 自动化多目标服务器配置与应力-应变曲线

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.30

P. Lama, Xiaobo Zhou

{"title":"aMOSS: Automated Multi-objective Server Provisioning with Stress-Strain Curving","authors":"P. Lama, Xiaobo Zhou","doi":"10.1109/ICPP.2011.30","DOIUrl":"https://doi.org/10.1109/ICPP.2011.30","url":null,"abstract":"A modern data center built upon virtualized server clusters for hosting Internet applications has multiple correlated and conflicting objectives. Utility-based approaches are often used for optimizing multiple objectives. However, it is difficult to define a local utility function to suitably represent one objective and to apply different weights on multiple local utility functions. Furthermore, choosing weights statically may not be effective in the face of highly dynamic workloads. In this paper, we propose an automated multi-objective server provisioning with stress-strain curving approach (aMOSS). First, we formulate a multi-objective optimization problem that is to minimize the number of physical machines used, the average response time and the total number of virtual servers allocated for multi-tier applications. Second, we propose a novel stress-strain curving method to automatically select the most efficient solution from a Pareto-optimal set that is obtained as the result of a nondominated sorting based optimization technique. Third, we enhance the method to reduce server switching cost and improve the utilization of physical machines. Simulation results demonstrate that compared to utility-based approaches, aMOSS automatically achieves the most efficient tradeoff between performance and resource allocation efficiency. We implement aMOSS in a test bed of virtualized blade servers and demonstrate that it outperforms a representative dynamic server provisioning approach in achieving the average response time guarantee and in resource allocation efficiency for a multi-tier Internet service. aMOSS provides a unique perspective to tackle the challenging autonomic server provisioning problem.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126437201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Moving Database Systems to Multicore: An Auto-Tuning Approach 将数据库系统迁移到多核:一种自动调优方法

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.24

V. Pankratius, Martin Heneka

{"title":"Moving Database Systems to Multicore: An Auto-Tuning Approach","authors":"V. Pankratius, Martin Heneka","doi":"10.1109/ICPP.2011.24","DOIUrl":"https://doi.org/10.1109/ICPP.2011.24","url":null,"abstract":"In the multicore era, database systems are facing new challenges to exploit parallelism and scale query performance on new processors. Taking advantage of multicore, however, is not trivial and goes far beyond inserting parallel constructs into available database system code. Varying hardware characteristics require different query parallelization strategies on each multicore platform. Query optimizers at the heart of each database system have to be reengineered, but the problem is that these optimizers are complex. In addition, optimization best practices evolved during a long-term process of research and experimentation. This paper presents a successful modular technique that does not require a major rewrite of database code from scratch. We discuss the implementation details of new fine-granular parallelism approach that can be used as an add-on to existing systems and other query optimizations. We start with query execution plans that are generated by sequential optimizers. Using multithreading, we exploit parallelism within queries and within join operators, which leverages the new performance opportunities in modern multicore hardware. Our query performance optimization is adaptive and employs QJetpack, a feedback-directed auto-tuner, in a novel way. It iteratively partitions query execution plans by detecting performance patterns that are pre-benchmarked on each platform. Then, the auto-tuner steers the application of parallel transformations based on query run-time feedback. This paper focuses on difficult scenarios with I/O-intensive join queries and shows that we can speed up query execution despite significant I/O limitations. The performance of all benchmarked queries could be improved, with low tuning overhead, on all of our multicore platforms.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116558578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

P2P Object Tracking in the Internet of Things 物联网中的P2P对象跟踪

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.14

Yanbo Wu, Quan Z. Sheng, D. Ranasinghe

{"title":"P2P Object Tracking in the Internet of Things","authors":"Yanbo Wu, Quan Z. Sheng, D. Ranasinghe","doi":"10.1109/ICPP.2011.14","DOIUrl":"https://doi.org/10.1109/ICPP.2011.14","url":null,"abstract":"With recent advances in technologies such as radio-frequency identification (RFID) and new standards such as the electronic product code (EPC), large-scale traceability is emerging as a key differentiator in a wide range of enterprise applications (e.g., counterfeit prevention, product recalls, and pilferage reduction). Such traceability applications often need to access data collected by individual enterprises in a distributed environment. Traditional centralized approaches (e.g., data ware-housing) are not feasible for these applications due to their unique characteristics such as large volume of data and sovereignty of the participants. In this paper, we describe an approach that enables applications to share traceability data across independent enterprises in a pure Peer-to-Peer (P2P) fashion. Data are stored in local repositories of participants and indexed in the network based on structured P2P overlays. In particular, we present a generic approach for efficiently indexing and locating individual objects in large, distributed traceable networks, most notably, in the emerging environment of the Internet of Things. The results from extensive experiments show that our approach scales well in both data volume and network size.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133698852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Tolerating Load Miss-Latency by Extending Effective Instruction Window with Low Complexity 通过扩展低复杂度的有效指令窗口来容忍负载缺失延迟

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.73

Walter Yuan-Hwa Li, Chin-Ling Huang, C. Chung

{"title":"Tolerating Load Miss-Latency by Extending Effective Instruction Window with Low Complexity","authors":"Walter Yuan-Hwa Li, Chin-Ling Huang, C. Chung","doi":"10.1109/ICPP.2011.73","DOIUrl":"https://doi.org/10.1109/ICPP.2011.73","url":null,"abstract":"An execute-ahead processor pre-executes instructions when a load miss would stall the processor. The typical design has several components that grow with the distance to execute ahead and need to be carefully balanced for optimal performance. This paper presents a novel approach which unifies those components and therefore is easy to implement and has no trouble to balance resource investment. When executing ahead, the processor enqueues (or preserves) all instructions along with the known execution results (including register and memory) in a preserving buffer (PB). When the leading load miss is resolved, the processor dequeues the instructions and then restores the known execution results or dispatch the instructions not yet executed. The implementation overheads include PB and a run-ahead cache for forwarding memory data. Only PB grows with the distance to execute ahead. This method can be applied to both in-order and out-of-order processors. Our experiments show that a four-way superscalar out-of-order processor with a 1 K-entry PB can have 15% and 120% speedup over the baseline design for SPEC INT2000 and SPEC FP2000 benchmark suites, assuming a 128-entry instruction window and a 300-cycle memory access latency.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124573142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Probabilistic Communication and I/O Tracing with Deterministic Replay at Scale 大规模确定性重播的概率通信和I/O跟踪

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.50

Xing Wu, Karthik Vijayakumar, F. Mueller, Xiaosong Ma, P. Roth

{"title":"Probabilistic Communication and I/O Tracing with Deterministic Replay at Scale","authors":"Xing Wu, Karthik Vijayakumar, F. Mueller, Xiaosong Ma, P. Roth","doi":"10.1109/ICPP.2011.50","DOIUrl":"https://doi.org/10.1109/ICPP.2011.50","url":null,"abstract":"With today's petascale supercomputers, applications often exhibit low efficiency, such as poor communication and I/O performance, that can be diagnosed by analysis tools. However, these tools either produce extremely large trace files that complicate performance analysis, or sacrifice accuracy to collect high-level statistical information using crude averaging. This work contributes Scala-H-Trace, which features more aggressive trace compression than any previous approach, particularly for applications that do not show strict regularity in SPMD behavior. Scala-H-Trace uses histograms expressing the probabilistic distribution of arbitrary communication and I/O parameters to capture variations. Yet, where other tools fail to scale, Scala-H-Trace guarantees trace files of near constant size, even for variable communication and I/O patterns, producing trace files orders of magnitudes smaller than using prior approaches. We demonstrate the ability to collect traces of applications running on thousands of processors with the potential to scale well beyond this level. We further present the first approach to deterministically replay such probabilistic traces (a) without deadlocks and (b) in a manner closely resembling the original applications. Our results show either near constant sized traces or only sub-linear increases in trace file sizes irrespective of the number of nodes utilized. Even with the aggressively compressed histogram-based traces, our replay times are within 12% to 15% of the runtime of original codes. Such concise traces resembling the behavior of production-style codes closely and our approach of deterministic replay of probabilistic traces are without precedence.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"29 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116460890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Making Many People Happy: Greedy Solutions for Content Distribution 让许多人高兴:内容分发的贪婪解决方案

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.31

Yunsheng Wang, Yuhong Guo, Jie Wu

{"title":"Making Many People Happy: Greedy Solutions for Content Distribution","authors":"Yunsheng Wang, Yuhong Guo, Jie Wu","doi":"10.1109/ICPP.2011.31","DOIUrl":"https://doi.org/10.1109/ICPP.2011.31","url":null,"abstract":"The increase in multimedia content makes providing good quality of service in wireless networks a challenging problem. Consider a set of users, with different content interests, connected to the same base station. The base station can only broadcast a limited amount of content, but wishes to satisfy the largest number of users. We approach this problem by considering each user as a point in a 2-D space, and each type of broadcast content as a circle. A point that is covered by a circle will be satisfied, and the closer the point is to the center of the circle, the higher the satisfaction. In this paper, we first formulate this problem as an optimal content distribution problem and show that it is NP-hard. The optimal problem can also be extended into an m-dimensional (m-D) space, and distance measurements can be expressed in a general p-norm. We then introduce three local greedy algorithms and compare their complexity. The approximation ratio of our greedy algorithms to the optimization problem is also formally analyzed in this paper. We perform extensive simulations using various conditions to evaluate our greedy algorithms. The results demonstrate that our solutions perform well and reflect our analytical results.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126385905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Probabilistic Best-Fit Multi-dimensional Range Query in Self-Organizing Cloud 自组织云中的概率最佳拟合多维范围查询

2011 International Conference on Parallel Processing Pub Date : 2011-09-13 DOI: 10.1109/ICPP.2011.13

S. Di, Cho-Li Wang, Weida Zhang, Luwei Cheng

{"title":"Probabilistic Best-Fit Multi-dimensional Range Query in Self-Organizing Cloud","authors":"S. Di, Cho-Li Wang, Weida Zhang, Luwei Cheng","doi":"10.1109/ICPP.2011.13","DOIUrl":"https://doi.org/10.1109/ICPP.2011.13","url":null,"abstract":"With virtual machine (VM) technology being increasingly mature, computing resources in modern Cloud systems can be partitioned in fine granularity and allocated on demand with \"pay-as-you-go\" model. In this work, we study the resource query and allocation problems in a Self-Organizing Cloud (SOC), where host machines are connected by a peer-to-peer (P2P) overlay network on the Internet. To run a user task in SOC, the requester needs to perform a multi-dimensional range search over the P2P network for locating host machines that satisfy its minimal demand on each type of resources. The multi-dimensional range search problem is known to be challenging as contentions along multiple dimensions could happen in the presence of the uncoordinated analogous queries. Moreover, low resource matching rate may happen while restricting query delay and network traffic. We design a novel resource discovery protocol, namely Proactive Index Diffusion CAN (PID-CAN), which can proactively diffuse resource indexes over the nodes and randomly route query messages among them. Such a protocol is especially suitable for the range query that needs to maximize its best-fit resource shares under possible competition along multiple resource dimensions. Via simulation, we show that PID-CAN could keep stable and optimized searching performance with low query delay and traffic overhead, for various test cases under different distributions of query ranges and competition degrees. It also performs satisfactorily in dynamic node-churning situation.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126437783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Memcached Design on High Performance RDMA Capable Interconnects 高性能RDMA互连的Memcached设计

2011 International Conference on Parallel Processing Pub Date : 2011-09-01 DOI: 10.1109/ICPP.2011.37

Jithin Jose, H. Subramoni, Miao Luo, Minjia Zhang, Jian Huang, Md. Wasi-ur-Rahman, Nusrat S. Islam, Xiangyong Ouyang, Hao Wang, S. Sur, D. Panda

{"title":"Memcached Design on High Performance RDMA Capable Interconnects","authors":"Jithin Jose, H. Subramoni, Miao Luo, Minjia Zhang, Jian Huang, Md. Wasi-ur-Rahman, Nusrat S. Islam, Xiangyong Ouyang, Hao Wang, S. Sur, D. Panda","doi":"10.1109/ICPP.2011.37","DOIUrl":"https://doi.org/10.1109/ICPP.2011.37","url":null,"abstract":"Memcached is a key-value distributed memory object caching system. It is used widely in the data-center environment for caching results of database calls, API calls or any other data. Using Memcached, spare memory in data-center servers can be aggregated to speed up lookups of frequently accessed information. The performance of Memcached is directly related to the underlying networking technology, as workloads are often latency sensitive. The existing Memcached implementation is built upon BSD Sockets interface. Sockets offers byte-stream oriented semantics. Therefore, using Sockets, there is a conversion between Memcached's memory-object semantics and Socket's byte-stream semantics, imposing an overhead. This is in addition to any extra memory copies in the Sockets implementation within the OS. Over the past decade, high performance interconnects have employed Remote Direct Memory Access (RDMA) technology to provide excellent performance for the scientific computation domain. In addition to its high raw performance, the memory-based semantics of RDMA fits very well with Memcached's memory-object model. While the Sockets interface can be ported to use RDMA, it is not very efficient when compared with low-level RDMA APIs. In this paper, we describe a novel design of Memcached for RDMA capable networks. Our design extends the existing open-source Memcached software and makes it RDMA capable. We provide a detailed performance comparison of our Memcached design compared to unmodified Memcached using Sockets over RDMA and 10Gigabit Ethernet network with hardware-accelerated TCP/IP. Our performance evaluation reveals that latency of Memcached Get of 4KB size can be brought down to 12 µs using ConnectX InfiniBand QDR adapters. Latency of the same operation using older generation DDR adapters is about 20µs. These numbers are about a factor of four better than the performance obtained by using 10GigE with TCP Offload. In addition, these latencies of Get requests over a range of message sizes are better by a factor of five to ten compared to IP over InfiniBand and Sockets Direct Protocol over InfiniBand. Further, throughput of small Get operations can be improved by a factor of six when compared to Sockets over 10 Gigabit Ethernet network. Similar factor of six improvement in throughput is observed over Sockets Direct Protocol using ConnectX QDR adapters. To the best of our knowledge, this is the first such memcached design on high performance RDMA capable interconnects.","PeriodicalId":115365,"journal":{"name":"2011 International Conference on Parallel Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134422606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 188