2010 39th International Conference on Parallel Processing最新文献

筛选
英文 中文
Speculative Execution on GPU: An Exploratory Study GPU的投机执行:探索性研究
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.53
Shaoshan Liu, C. Eisenbeis, J. Gaudiot
{"title":"Speculative Execution on GPU: An Exploratory Study","authors":"Shaoshan Liu, C. Eisenbeis, J. Gaudiot","doi":"10.1109/ICPP.2010.53","DOIUrl":"https://doi.org/10.1109/ICPP.2010.53","url":null,"abstract":"We explore the possibility of using GPUs for speculative execution: we implement software value prediction techniques to accelerate programs with limited parallelism, and software speculation techniques to accelerate programs that contain runtime parallelism, which are hard to parallelize statically. Our experiment results show that due to the relatively high overhead, mapping software value prediction techniques on existing GPUs may not bring any immediate performance gain. On the other hand, although software speculation techniques introduce some overhead as well, mapping these techniques to existing GPUs can already bring some performance gain over CPU.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115702408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Dynamic Switching-Frequency Scaling: Scheduling Overcommitted Domains in Xen VMM 动态交换频率伸缩:Xen VMM中复用域的调度
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.36
Huacai Chen, Hai Jin, Kan Hu, Jian Huang
{"title":"Dynamic Switching-Frequency Scaling: Scheduling Overcommitted Domains in Xen VMM","authors":"Huacai Chen, Hai Jin, Kan Hu, Jian Huang","doi":"10.1109/ICPP.2010.36","DOIUrl":"https://doi.org/10.1109/ICPP.2010.36","url":null,"abstract":"Virtualization enables multiple guest operating systems run on a single physical platform. These virtual machines may host any types of application, including concurrent HPC programs. Traditionally, VMM schedulers have focused on fairly sharing the processor resources among domains, rarely consider VCPUs’ behaviors. However, this can result in poor application performance to overcommitted domains if there are concurrent programs hosted in them. In this paper we review the properties of both Xen’s Credit and SEDF schedulers, and show how these schedulers may seriously impact the performance of the communication-intensive and I/O-intensive concurrent applications in overcommitted domains. We discuss the origination of the problem theoretically, and confirm the derived conclusion on benchmarks. A novel approach, that dynamically scales the context switching-frequency by selecting variable time slices according to VCPUs` behaviors, is then proposed to improve the Credit scheduler more adaptive for concurrent applications. The experimental results show that this extended Credit scheduler can improve the performance of communication-intensive and I/O-intensive concurrent applications in overcommitted domains to the same magnitude as in undercommitted domains.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115595983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A Theoretical Framework for Value Prediction in Parallel Systems 并行系统价值预测的理论框架
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.10
Shaoshan Liu, C. Eisenbeis, J. Gaudiot
{"title":"A Theoretical Framework for Value Prediction in Parallel Systems","authors":"Shaoshan Liu, C. Eisenbeis, J. Gaudiot","doi":"10.1109/ICPP.2010.10","DOIUrl":"https://doi.org/10.1109/ICPP.2010.10","url":null,"abstract":"We present here a theoretical framework towards a fundamental understanding of the effects of value prediction. Our framework consists of two parts: first, an identification of the theoretical limit of value prediction and an indication of the potential to improve parallelism through the exploitation of value predictability; second, a demonstration of the feasibility of data prediction and a theoretical support to verify this feasibility. The experiment results demonstrate the immense potential of value prediction in enhancing the performance of many-core architectures.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116528231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Distributed Minimum Transmission Multicast Routing Protocol for Wireless Sensor Networks 无线传感器网络的分布式最小传输组播路由协议
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.28
Long Cheng, Sajal K. Das, Jiannong Cao, Canfeng Chen, Jian Ma
{"title":"Distributed Minimum Transmission Multicast Routing Protocol for Wireless Sensor Networks","authors":"Long Cheng, Sajal K. Das, Jiannong Cao, Canfeng Chen, Jian Ma","doi":"10.1109/ICPP.2010.28","DOIUrl":"https://doi.org/10.1109/ICPP.2010.28","url":null,"abstract":"Energy efficient multicast routing is one of the fundamental problems in wireless sensor networks (WSNs). Previous work has shown that when the goal is to find multicast trees with minimum transmission cost, the problem becomes NP-complete. In this work, we present a heuristic distributed minimum transmission multicast routing protocol (MTMRP) for WSNs. By introducing the biased backoff scheme and taking advantage of the broadcast nature of wireless communication, MTMRP chooses the forwarding routes which can connect more multicast receivers. Moreover, MTMRP introduces a path handover scheme, which can prune redundant routes for multicast routing. As a result, the multicast transmission cost is reduced in a distributed manner. We conduct extensive evaluations to study the performance of the proposed MTMRP compared with existing protocols. Simulation results demonstrate that our scheme effectively improves the multicast routing energy efficiency.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122784182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Rearchitecting MapReduce for Heterogeneous Multicore Processors with Explicitly Managed Memories 具有显式管理内存的异构多核处理器的MapReduce重构
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.21
Anastasios Papagiannis, Dimitrios S. Nikolopoulos
{"title":"Rearchitecting MapReduce for Heterogeneous Multicore Processors with Explicitly Managed Memories","authors":"Anastasios Papagiannis, Dimitrios S. Nikolopoulos","doi":"10.1109/ICPP.2010.21","DOIUrl":"https://doi.org/10.1109/ICPP.2010.21","url":null,"abstract":"This paper presents a new design and an implementation of the runtime system of MapReduce for heterogeneous multicore processors with explicitly managed local memories. We advance the state of the art in runtime support for MapReduce using five instruments: (1) A new multi-threaded, event-driven controller for task instantiation, task scheduling, synchronization, and bulk-synchronous execution of MapReduce stages. The controller improves utilization of control efficient cores, minimizes control overhead in the runtime system, and overlaps task instantiation with task scheduling on compute-efficient cores. (2) An implicit partitioning scheme which eliminates redundant memory copies. (3) An adaptive memory management scheme which combines efficient memory preallocation for applications with statically known output volume with dynamic allocation using runahead tasks for applications with statically unknown output volume. (4) An optimized quick-sort/merge-sort scheme which reduces the critical path length of merge-sort. (5) An optimized execution scheme which avoids redundant data transfers to and from local stores in applications that emit keys with the same value. Put together, these techniques accelerate representative MapReduce workloads by a factor of 1.81x (geometric mean) compared to a reference design that represents the state of the art.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129549905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
DISQO: A Distributed Framework for Spatial Queries over Moving Objects DISQO:移动对象空间查询的分布式框架
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.49
Baihua Zheng, Wang-Chien Lee, Ken C. K. Lee, J. Winter, Meng-Chang Chen
{"title":"DISQO: A Distributed Framework for Spatial Queries over Moving Objects","authors":"Baihua Zheng, Wang-Chien Lee, Ken C. K. Lee, J. Winter, Meng-Chang Chen","doi":"10.1109/ICPP.2010.49","DOIUrl":"https://doi.org/10.1109/ICPP.2010.49","url":null,"abstract":"This paper presents DISQO, a DIStributed Framework for Spatial Queries over Moving Objects. Distinguished from existing work, DISQO aims at achieving high scalability and system performance in support of both snapshot and continuous spatial queries over moving objects. The design of DISQO is based on our observation that exchanging object location information and query information between the location server and moving objects can reduce communication cost and facilitate scalable query processing. Thus, DISQO is built upon the notions of roaming regions and query maps in correspondence with object location information and query information. A comprehensive performance evaluation has been conducted to demonstrate the superiority of DISQO design, compared with existing state-of-the-art frameworks for monitoring moving objects.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130612085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Near-Optimal Rendezvous Protocols for RDMA-Enabled Clusters 支持rdma集群的近最优会合协议
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.72
Matthew Small, Zheng Gu, Xin Yuan
{"title":"Near-Optimal Rendezvous Protocols for RDMA-Enabled Clusters","authors":"Matthew Small, Zheng Gu, Xin Yuan","doi":"10.1109/ICPP.2010.72","DOIUrl":"https://doi.org/10.1109/ICPP.2010.72","url":null,"abstract":"Optimizing Message Passing Interface (MPI) point-to-point communication for large messages is of paramount importance since most communications in MPI applications are performed by such operations. Remote Direct Memory Access (RDMA) allows one-sided data transfer and provides great flexibility in the design of efficient communication protocols for large messages. However, achieving high performance on RDMA-enabled clusters is still challenging due to the complexity both in communication protocols and in protocol invocation scenarios. In this work, we investigate a profile-driven compiled-assisted protocol customization approach for efficient communication on RDMA-enabled clusters. We analyze existing protocols and show that they are not ideal in many situations. By leveraging the RDMA capability, we develop a set of protocols that can provide near-optimal performance for all protocol invocation scenarios, which allows protocol customization to achieve near-optimal performance when the appropriate protocol is used for each communication. Finally, we evaluate the potential benefits of protocol customization using micro-benchmarks and application benchmarks. The results demonstrate that the proposed protocols can out-perform traditional rendezvous protocols to a large degree in many situations and that protocol customization can significantly improve MPI communication performance.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116341649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
PacketC Language and Parallel Processing of Masked Databases PacketC语言与掩码数据库并行处理
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.55
R. Duncan, P. Jungck, Kenneth Ross
{"title":"PacketC Language and Parallel Processing of Masked Databases","authors":"R. Duncan, P. Jungck, Kenneth Ross","doi":"10.1109/ICPP.2010.55","DOIUrl":"https://doi.org/10.1109/ICPP.2010.55","url":null,"abstract":"Network packet processing’s increasing speeds and volume create an incentive to use parallel processing. Such processing often involves comparing selected packet data to the contents of large tables (e.g., for routing packets or controlling system access). Thus, commercial systems often use multiple network processors [1] to provide parallel processing in general and use associative memory chips to provide parallel table operations in particular. Parallel network programming is usually done in a C dialect with machine-specific extensions. The associative memory capabilities are often provided by ternary content addressable memory (TCAM) chips in order to supply the fast, masking-based searches needed in this domain. TCAM use is normally controlled by vendor software, rather than by the application developer. Thus, an application is typically restricted to a small number of predefined templates and mediated by vendor system software. Thus, application developers cannot use high-level languages to express network table operations in an intuitive, portable way, nor exploit parallel devices like TCAMs in a flexible manner. This paper presents CloudShield's packetC® language [2], a C dialect that hides most host-machine specifics, supports coarse-grain parallelism and supplies high-level data type and operator extensions for packet processing. We describe packetC’s database and record constructs that support network application table operations, including masked matching. We show how our implementation of packetC with network processors, FPGAs and TCAMs lets the user enjoy parallel performance benefits without the usual vendor constraints or reliance on hardware-specific programming.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126524629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors 在功率受限的芯片多处理器中实现公平或差异化的缓存共享
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.9
Xiaorui Wang, Kai Ma, Yefu Wang
{"title":"Achieving Fair or Differentiated Cache Sharing in Power-Constrained Chip Multiprocessors","authors":"Xiaorui Wang, Kai Ma, Yefu Wang","doi":"10.1109/ICPP.2010.9","DOIUrl":"https://doi.org/10.1109/ICPP.2010.9","url":null,"abstract":"Limiting the peak power consumption of chip multiprocessors (CMPs) has recently received a lot of attention. In order to enable chip-level power capping, the peak power consumption of on-chip L2 caches in a CMP often needs to be constrained by dynamically transitioning selected cache banks into low-power modes. However, dynamic cache resizing for power capping may cause undesired long cache access latencies, and even thread starving and thrashing, for the applications running on the CMP. In this paper, we propose a novel cache management strategy that can limit the peak power consumption of L2 caches and provide fairness guarantees, such that the cache access latencies of the application threads co-scheduled on the CMP are impacted more uniformly. Our strategy is also extended to provide differentiated cache latency guarantees that can help the OS to enforce the desired thread priorities at the architectural level and achieve desired rates of thread progress for co-scheduled applications. Our solution features a two-tier control architecture rigorously designed based on advanced feedback control theory for guaranteed control accuracy and system stability. Extensive experimental results demonstrate that our solution can achieve the desired cache power capping, fair or differentiated cache sharing, and power-performance tradeoffs for many applications.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115855373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Cubic Ring Networks: A Polymorphic Topology for Network-on-Chip 三次环网络:片上网络的多态拓扑
2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.52
B. Zafar, J. Draper, T. Pinkston
{"title":"Cubic Ring Networks: A Polymorphic Topology for Network-on-Chip","authors":"B. Zafar, J. Draper, T. Pinkston","doi":"10.1109/ICPP.2010.52","DOIUrl":"https://doi.org/10.1109/ICPP.2010.52","url":null,"abstract":"As chip multiprocessors transition from multi-core to many-core, on-chip network power is increasingly becoming a key barrier to scalability. Studies have shown that on-chip networks can consume up to 36% of the total chip power, while analysis of network traffic reveals that for extended periods of execution time, network load is well below the network capacity in many applications. In recent studies, researchers have proposed to exploit this temporal variability in network traffic to dynamically turn off links, buffers and segments of the on-chip routers. In this work, we make the case for a polymorphic topology, called Cubic Ring (cRing), that allows dynamically turning off over 30% of resources in a 2D network (and more in higher dimensional networks), with less than 5% increase in average distance. As a result, cRing networks provide an elegant way to trade off network bandwidth for lower (static) power. A complete formalism for the proposed cRing topologies and the associated routing algorithm is presented, along with evaluation under synthetic workloads.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130095692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信