2010 39th International Conference on Parallel Processing最新文献

Speculative Execution on GPU: An Exploratory Study GPU的投机执行:探索性研究

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.53

Shaoshan Liu, C. Eisenbeis, J. Gaudiot

引用次数: 25

Dynamic Switching-Frequency Scaling: Scheduling Overcommitted Domains in Xen VMM 动态交换频率伸缩:Xen VMM中复用域的调度

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.36

Huacai Chen, Hai Jin, Kan Hu, Jian Huang

{"title":"Dynamic Switching-Frequency Scaling: Scheduling Overcommitted Domains in Xen VMM","authors":"Huacai Chen, Hai Jin, Kan Hu, Jian Huang","doi":"10.1109/ICPP.2010.36","DOIUrl":"https://doi.org/10.1109/ICPP.2010.36","url":null,"abstract":"Virtualization enables multiple guest operating systems run on a single physical platform. These virtual machines may host any types of application, including concurrent HPC programs. Traditionally, VMM schedulers have focused on fairly sharing the processor resources among domains, rarely consider VCPUs’ behaviors. However, this can result in poor application performance to overcommitted domains if there are concurrent programs hosted in them. In this paper we review the properties of both Xen’s Credit and SEDF schedulers, and show how these schedulers may seriously impact the performance of the communication-intensive and I/O-intensive concurrent applications in overcommitted domains. We discuss the origination of the problem theoretically, and confirm the derived conclusion on benchmarks. A novel approach, that dynamically scales the context switching-frequency by selecting variable time slices according to VCPUs` behaviors, is then proposed to improve the Credit scheduler more adaptive for concurrent applications. The experimental results show that this extended Credit scheduler can improve the performance of communication-intensive and I/O-intensive concurrent applications in overcommitted domains to the same magnitude as in undercommitted domains.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115595983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

A Theoretical Framework for Value Prediction in Parallel Systems 并行系统价值预测的理论框架

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.10

Shaoshan Liu, C. Eisenbeis, J. Gaudiot

引用次数: 11

Distributed Minimum Transmission Multicast Routing Protocol for Wireless Sensor Networks 无线传感器网络的分布式最小传输组播路由协议

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.28

Long Cheng, Sajal K. Das, Jiannong Cao, Canfeng Chen, Jian Ma

引用次数: 19

Rearchitecting MapReduce for Heterogeneous Multicore Processors with Explicitly Managed Memories 具有显式管理内存的异构多核处理器的MapReduce重构

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.21

Anastasios Papagiannis, Dimitrios S. Nikolopoulos

{"title":"Rearchitecting MapReduce for Heterogeneous Multicore Processors with Explicitly Managed Memories","authors":"Anastasios Papagiannis, Dimitrios S. Nikolopoulos","doi":"10.1109/ICPP.2010.21","DOIUrl":"https://doi.org/10.1109/ICPP.2010.21","url":null,"abstract":"This paper presents a new design and an implementation of the runtime system of MapReduce for heterogeneous multicore processors with explicitly managed local memories. We advance the state of the art in runtime support for MapReduce using five instruments: (1) A new multi-threaded, event-driven controller for task instantiation, task scheduling, synchronization, and bulk-synchronous execution of MapReduce stages. The controller improves utilization of control efficient cores, minimizes control overhead in the runtime system, and overlaps task instantiation with task scheduling on compute-efficient cores. (2) An implicit partitioning scheme which eliminates redundant memory copies. (3) An adaptive memory management scheme which combines efficient memory preallocation for applications with statically known output volume with dynamic allocation using runahead tasks for applications with statically unknown output volume. (4) An optimized quick-sort/merge-sort scheme which reduces the critical path length of merge-sort. (5) An optimized execution scheme which avoids redundant data transfers to and from local stores in applications that emit keys with the same value. Put together, these techniques accelerate representative MapReduce workloads by a factor of 1.81x (geometric mean) compared to a reference design that represents the state of the art.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129549905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

DISQO: A Distributed Framework for Spatial Queries over Moving Objects DISQO:移动对象空间查询的分布式框架

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.49

Baihua Zheng, Wang-Chien Lee, Ken C. K. Lee, J. Winter, Meng-Chang Chen

引用次数: 2

Near-Optimal Rendezvous Protocols for RDMA-Enabled Clusters 支持rdma集群的近最优会合协议

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.72

Matthew Small, Zheng Gu, Xin Yuan

{"title":"Near-Optimal Rendezvous Protocols for RDMA-Enabled Clusters","authors":"Matthew Small, Zheng Gu, Xin Yuan","doi":"10.1109/ICPP.2010.72","DOIUrl":"https://doi.org/10.1109/ICPP.2010.72","url":null,"abstract":"Optimizing Message Passing Interface (MPI) point-to-point communication for large messages is of paramount importance since most communications in MPI applications are performed by such operations. Remote Direct Memory Access (RDMA) allows one-sided data transfer and provides great flexibility in the design of efficient communication protocols for large messages. However, achieving high performance on RDMA-enabled clusters is still challenging due to the complexity both in communication protocols and in protocol invocation scenarios. In this work, we investigate a profile-driven compiled-assisted protocol customization approach for efficient communication on RDMA-enabled clusters. We analyze existing protocols and show that they are not ideal in many situations. By leveraging the RDMA capability, we develop a set of protocols that can provide near-optimal performance for all protocol invocation scenarios, which allows protocol customization to achieve near-optimal performance when the appropriate protocol is used for each communication. Finally, we evaluate the potential benefits of protocol customization using micro-benchmarks and application benchmarks. The results demonstrate that the proposed protocols can out-perform traditional rendezvous protocols to a large degree in many situations and that protocol customization can significantly improve MPI communication performance.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116341649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

PacketC Language and Parallel Processing of Masked Databases PacketC语言与掩码数据库并行处理

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.55

R. Duncan, P. Jungck, Kenneth Ross

{"title":"PacketC Language and Parallel Processing of Masked Databases","authors":"R. Duncan, P. Jungck, Kenneth Ross","doi":"10.1109/ICPP.2010.55","DOIUrl":"https://doi.org/10.1109/ICPP.2010.55","url":null,"abstract":"Network packet processing’s increasing speeds and volume create an incentive to use parallel processing. Such processing often involves comparing selected packet data to the contents of large tables (e.g., for routing packets or controlling system access). Thus, commercial systems often use multiple network processors [1] to provide parallel processing in general and use associative memory chips to provide parallel table operations in particular. Parallel network programming is usually done in a C dialect with machine-specific extensions. The associative memory capabilities are often provided by ternary content addressable memory (TCAM) chips in order to supply the fast, masking-based searches needed in this domain. TCAM use is normally controlled by vendor software, rather than by the application developer. Thus, an application is typically restricted to a small number of predefined templates and mediated by vendor system software. Thus, application developers cannot use high-level languages to express network table operations in an intuitive, portable way, nor exploit parallel devices like TCAMs in a flexible manner. This paper presents CloudShield's packetC® language [2], a C dialect that hides most host-machine specifics, supports coarse-grain parallelism and supplies high-level data type and operator extensions for packet processing. We describe packetC’s database and record constructs that support network application table operations, including masked matching. We show how our implementation of packetC with network processors, FPGAs and TCAMs lets the user enjoy parallel performance benefits without the usual vendor constraints or reliance on hardware-specific programming.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126524629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors 在功率受限的芯片多处理器中实现公平或差异化的缓存共享

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.9

Xiaorui Wang, Kai Ma, Yefu Wang

{"title":"Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors","authors":"Xiaorui Wang, Kai Ma, Yefu Wang","doi":"10.1109/ICPP.2010.9","DOIUrl":"https://doi.org/10.1109/ICPP.2010.9","url":null,"abstract":"Limiting the peak power consumption of chip multiprocessors (CMPs) has recently received a lot of attention. In order to enable chip-level power capping, the peak power consumption of on-chip L2 caches in a CMP often needs to be constrained by dynamically transitioning selected cache banks into low-power modes. However, dynamic cache resizing for power capping may cause undesired long cache access latencies, and even thread starving and thrashing, for the applications running on the CMP. In this paper, we propose a novel cache management strategy that can limit the peak power consumption of L2 caches and provide fairness guarantees, such that the cache access latencies of the application threads co-scheduled on the CMP are impacted more uniformly. Our strategy is also extended to provide differentiated cache latency guarantees that can help the OS to enforce the desired thread priorities at the architectural level and achieve desired rates of thread progress for co-scheduled applications. Our solution features a two-tier control architecture rigorously designed based on advanced feedback control theory for guaranteed control accuracy and system stability. Extensive experimental results demonstrate that our solution can achieve the desired cache power capping, fair or differentiated cache sharing, and power-performance tradeoffs for many applications.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115855373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Cubic Ring Networks: A Polymorphic Topology for Network-on-Chip 三次环网络:片上网络的多态拓扑

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.52

B. Zafar, J. Draper, T. Pinkston

引用次数: 4