2008 37th International Conference on Parallel Processing最新文献_第8页

Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches 在许多分布式片上L2缓存上驯服单线程程序性能

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.29

Lei Jin, Sangyeun Cho

{"title":"Taming Single-Thread Program Performance on Many Distributed On-Chip L2 Caches","authors":"Lei Jin, Sangyeun Cho","doi":"10.1109/ICPP.2008.29","DOIUrl":"https://doi.org/10.1109/ICPP.2008.29","url":null,"abstract":"This paper presents a two-part study on managing distributed NUCA (non-uniform cache architecture) L2caches in a future many core processor to obtain high single thread program performance. The first part of our study is a limit study where we determine data to cache slice mappings at the memory page granularity based on detailed inter-page conflict information derived from program's memory reference trace. By considering cache access latency and cache miss rate simultaneously when mapping data to L2 cache slices, this \"oracle\" scheme outperforms the conventional shared caching scheme by up to 208% with an average of 45% on a sixteen-core processor. In the second part of the study, we propose and evaluate a dynamic cache management scheme that determines the home cache slice and cache bin for memory pages without any static program information. The dynamic scheme outperforms the shared caching scheme by up to 191% with an average of 32%, achieving much of the performance we observed in the limit study. We also find that the proposed dynamic scheme adapts to multiprogrammed workloads' behavior well and performs significantly better than both the private caching scheme and the shared caching scheme.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128969325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Scalable Dynamic Load Balancing Using UPC 使用UPC的可扩展动态负载平衡

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.19

Stephen L. Olivier, J. Prins

{"title":"Scalable Dynamic Load Balancing Using UPC","authors":"Stephen L. Olivier, J. Prins","doi":"10.1109/ICPP.2008.19","DOIUrl":"https://doi.org/10.1109/ICPP.2008.19","url":null,"abstract":"An asynchronous work-stealing implementation of dynamic load balance is implemented using Unified Parallel C (UPC) and evaluated using the Unbalanced Tree Search (UTS) benchmark [Olivier, S., et al., 2007]. The UTS benchmark presents a synthetic tree-structured search space that is highly imbalanced. Parallel implementation of the search requires continuous dynamic load balancing to keep all processors engaged in the search. Our implementation achieves better scaling and parallel efficiency in both shared memory and distributed memory settings than previous efforts using UPC [Olivier, S., et al., 2007] and MPI [Dinan, J., et al., 2007]. We observe parallel efficiency of 80% using 1024 processors performing over 85,000 total load balancing operations per second continuously. The UPC programming model provides substantial simplifications in the expression of the asynchronous work stealing protocol compared with MPI. However, to obtain performance portability with UPC in both shared memory and distributed memory settings requires the careful use of one sided reads and writes to minimize the impact of high latency communication. Additional protocol improvements are made to improve dissemination of available work and to decrease the cost of termination detection.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129014359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 69

Overcoming Scalability Challenges for Tool Daemon Launching 克服工具守护进程启动的可伸缩性挑战

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.63

D. Ahn, D. Arnold, B. Supinski, Gregory L. Lee, B. Miller, M. Schulz

引用次数: 27

Impacts of Indirect Blocks on Buffer Cache Energy Efficiency 间接块对缓冲缓存能量效率的影响

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.60

Jianhui Yue, Yifeng Zhu, Zhao Cai

引用次数: 3

GeWave: Geographically-Aware Wave for File Consistency Maintenance in P2P Systems GeWave:用于P2P系统中文件一致性维护的地理感知Wave

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.52

Haiying Shen

{"title":"GeWave: Geographically-Aware Wave for File Consistency Maintenance in P2P Systems","authors":"Haiying Shen","doi":"10.1109/ICPP.2008.52","DOIUrl":"https://doi.org/10.1109/ICPP.2008.52","url":null,"abstract":"File consistency maintenance in P2P systems is a technique for maintaining consistency between files and their replicas. Most traditional consistency maintenance methods depend on either message spreading or structure for update propagation by pushing. Message spreading generates high overhead due to redundant messages, and cannot guarantee that every replica node receives an update. Structure-based pushing methods reduce the overhead but cannot guarantee timely consistency in churn. Moreover, most methods are unable to consider physical proximity to improve efficiency. To further reduce update overhead, enhance guarantee of consistency, and take proximity into account, this paper presents a geographically-aware Wave method (GeWave). Depending on adaptive polling in a dynamic structure, GeWave avoids redundant file updates by dynamically adapting to time-varying file update and query rates, and ensures the consistency of query results even in churn. Furthermore, it conducts update propagation between geographically close nodes in a distributed manner. Simulation results demonstrate the efficiency of GeWave in comparison with other representative consistency maintenance schemes. It dramatically reduces the overhead and yields significant improvements on efficiency and scalability of file consistency maintenance schemes.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123634491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Utility-Based Distributed Routing in Intermittently Connected Networks 间歇连接网络中基于效用的分布式路由

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.77

Ze Li, Haiying Shen

{"title":"Utility-Based Distributed Routing in Intermittently Connected Networks","authors":"Ze Li, Haiying Shen","doi":"10.1109/ICPP.2008.77","DOIUrl":"https://doi.org/10.1109/ICPP.2008.77","url":null,"abstract":"Intermittently connected mobile networks don't have a complete path from a source to a destination at most of the time. Such an environment can be found in very sparse mobile networks where nodes meet only occasionally or in wireless sensor networks where nodes always sleep to conserve energy. Current transmission approaches in such networks are primarily based on: multi-copy flooding scheme and single-copy forwarding scheme. However, they incur either high overheads due to excessive transmissions or long delay due to possible incorrect choices during forwarding. In this paper, we propose a A utility-based distributed routing algorithm with multi-copies called UDM, in which a packet is initially replicated to a certain number of its neighbor nodes, which sequentially forward those packets to the destination node based on a probabilistic routing scheme. Some buffer management methods are also proposed to further improve its performance. Theoretical analyze and simulations show that compared to epidemic routing, spray and wait routing, UDM routing scheme provides a nearly optimal delay performance with a stable packet arrive rate in the community mobility model.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124289899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Parallel Inferencing for OWL Knowledge Bases OWL知识库的并行推理

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.64

R. Soma, V. Prasanna

引用次数: 85

Memory Access Scheduling Schemes for Systems with Multi-Core Processors 多核处理器系统的内存访问调度方案

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.53

Hongzhong Zheng, Jiang Lin, Zhao Zhang, Zhichun Zhu

{"title":"Memory Access Scheduling Schemes for Systems with Multi-Core Processors","authors":"Hongzhong Zheng, Jiang Lin, Zhao Zhang, Zhichun Zhu","doi":"10.1109/ICPP.2008.53","DOIUrl":"https://doi.org/10.1109/ICPP.2008.53","url":null,"abstract":"On systems with multi-core processors, the memory access scheduling scheme plays an important role not only in utilizing the limited memory bandwidth but also in balancing the program execution on all cores. In this study, we propose a scheme, called ME-LREQ, which considers the utilization of both processor cores and memory subsystem. It takes into consideration both the long-term and short-term gains of serving a memory request by prioritizing requests hitting on the row buffers and from the cores that can utilize memory more efficiently and have fewer pending requests. We have also thoroughly evaluated a set of memory scheduling schemes that differentiate and prioritize requests from different cores. Our simulation results show that for memory-intensive, multiprogramming workloads, the new policy improves the overall performance by 10.7% on average and up to 17.7% on a four-core processor, when compared with scheme that serves row buffers hit memory requests first and allows memory reads bypassing writes; and by up to 9.2% (6.4% on average) when compared with the scheme that serves requests from the core with the fewest pending requests first.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127349072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Bandwidth-Efficient Continuous Query Processing over DHTs 基于dht的带宽高效连续查询处理

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.11

Yingwu Zhu

引用次数: 3

On-the-Fly Recovery of Job Input Data in Supercomputers 超级计算机中作业输入数据的动态恢复

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-01 DOI: 10.1109/ICPP.2008.28

Chao Wang, Zhe Zhang, Sudharshan S. Vazhkudai, Xiaosong Ma, F. Mueller

引用次数: 2