2008 37th International Conference on Parallel Processing最新文献

TPTS: A Novel Framework for Very Fast Manycore Processor Architecture Simulation TPTS:一种快速多核处理器架构模拟的新框架

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.7

Sangyeun Cho, Socrates Demetriades, Shayne Evans, Lei Jin, Hyunjin Lee, Kiyeon Lee, Michael Moeng

引用次数: 37

Achieving Multi-Level Parallelism in the Filter-Labeled Stream Programming Model 在过滤器标记流编程模型中实现多级并行

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.72

George Teodoro, Daniel Fireman, Dorgival Olavo Guedes Neto, Wagner Meira Jr, R. Ferreira

{"title":"Achieving Multi-Level Parallelism in the Filter-Labeled Stream Programming Model","authors":"George Teodoro, Daniel Fireman, Dorgival Olavo Guedes Neto, Wagner Meira Jr, R. Ferreira","doi":"10.1109/ICPP.2008.72","DOIUrl":"https://doi.org/10.1109/ICPP.2008.72","url":null,"abstract":"New architectural trends in chip design resulted in machines with multiple processing units as well as efficient communication networks, leading to the wide availability of systems that provide multiple levels of parallelism, both inter- and intra-machine. Developing applications that efficiently make use of such systems is a challenge, specially for application-domain programmers. In this paper we present a new version of the Anthill programming environment that efficiently exploits multi-level parallelism and experimental results that demonstrate such efficiency. Anthill is based on the filter-stream model; in this model, applications are decomposed into a set of filters communicating through streams, which has already been shown to be efficient for expressing inter-machine parallelism. We replaced the filter run-time environment, originally process-oriented, with an event-oriented version. This new version allow programmers to efficiently express opportunities for parallelism within each compute node through a higher-level programming abstraction. We evaluated our solution on dual- and quad-core machines with two data mining applications: Eclat and KNN. Both had drops in execution time nearly proportional to the number of cores on a single machine. When using a cluster of dual-core machines, speed-ups were close to linear on the number of available cores for both applications, confirming event-oriented Anthill performs well both on the inter- and intra-machine parallelism levels.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122868467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Scalability Evaluation and Optimization of Multi-Core SIP Proxy Server 多核SIP代理服务器的可扩展性评估与优化

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.30

Jia Zou, Zhiyong Liang, Yiqi Dai

{"title":"Scalability Evaluation and Optimization of Multi-Core SIP Proxy Server","authors":"Jia Zou, Zhiyong Liang, Yiqi Dai","doi":"10.1109/ICPP.2008.30","DOIUrl":"https://doi.org/10.1109/ICPP.2008.30","url":null,"abstract":"The session initiation protocol (SIP) is one popular signaling protocol used in many collaborative applications like VoIP, instant messaging and presence. In this paper, we evaluate one well-known SIP proxy server (i.e. OpenSER) on two multi-core platforms: SUN Niagara and Intel Clovertown, which are installed with Solaris OS and Linux OS respectively. Through the evaluation, we identify three factors that determine the performance scalability of OpenSER server. One is inside the OSes: overhead from the coarse-grained locks used in the UDP socket layer. Others are specific to the multi-process programming model: 1. overhead caused by passing socket descriptors among processes; 2. overhead brought by sharing transaction objects among processes. To remedy these problems, we propose several incremental optimizations, including out-of-box dispatcher, light-weight connection dispatcher and dataset partition, and achieve significant improvements: for UDP and TCP transport, on SUN Niagara, speedup (ideal is 8) are improved from 1.5 to 5.8 and from 2.2 to 6.2, respectively; on Intel Clovertown, speedup (ideal is 8) are improved from 1.2 to 3.1 and from 2.6 to 4.8, respectively.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128310898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

On Modeling Fault Tolerance of Gossip-Based Reliable Multicast Protocols 基于gossip的可靠组播协议容错建模研究

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.10

Xiaopeng Fan, Jiannong Cao, Weigang Wu, M. Raynal

引用次数: 7

Improving the Performance of Multithreaded Sparse Matrix-Vector Multiplication Using Index and Value Compression 利用索引和值压缩提高多线程稀疏矩阵向量乘法的性能

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.62

K. Kourtis, G. Goumas, N. Koziris

引用次数: 37

Performance of HPC Middleware over InfiniBand WAN 高性能计算中间件在ib广域网上的性能研究

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.75

S. Narravula, H. Subramoni, P. Lai, R. Noronha, D. Panda

{"title":"Performance of HPC Middleware over InfiniBand WAN","authors":"S. Narravula, H. Subramoni, P. Lai, R. Noronha, D. Panda","doi":"10.1109/ICPP.2008.75","DOIUrl":"https://doi.org/10.1109/ICPP.2008.75","url":null,"abstract":"High performance interconnects such as InfiniBand (IB)have enabled large scale deployments of High Performance Computing (HPC) systems. High performance communication and IO middleware such as MPI and NFS over RDMA have also been redesigned to leverage the performance of these modern interconnects. With the advent of long haul InfiniBand (IB WAN), IB applications now have inter-cluster reaches. While this technology is intended to enable high performance network connectivity across WAN links,it is important to study and characterize the actual performance that the existing IB middleware achieve in these emerging IB WAN scenarios. In this paper, we study and analyze the performance characteristics of the following three HPC middleware: (i)IPoIB (IP traffic over IB), (ii) MPI and (iii) NFS over RDMA. We utilize the Obsidian IB WAN routers for inter-cluster connectivity. Our results show that many of the applications absorb smaller network delays fairly well. However, most approaches get severely impacted in high delay scenarios. Further, communication protocols need to be optimized in higher delay scenarios to improve the performance. In this paper, we propose several such optimizations to improve communication performance. Our experimental results show that techniques such as WAN-aware protocols, transferring data using large messages (message coalescing) and using parallel data streams can improve the communication performance (up to 50%) in high delay scenarios. Overall, these results demonstrate that IB WAN technologies can enable cluster-of-clusters architecture as a feasible platform for HPC systems.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125729633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Bounded LSH for Similarity Search in Peer-to-Peer File Systems 点对点文件系统相似性搜索的有界LSH

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.25

Yu Hua, Bin Xiao, D. Feng, Bo Yu

引用次数: 23

Ocean-Atmosphere Modelization over the Grid 网格上的海洋-大气模型化

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.37

Y. Caniou, E. Caron, G. Charrier, Andréea Chis, F. Desprez, E. Maisonnave

引用次数: 7

Flash Data Dissemination in Unstructured Peer-to-Peer Networks 非结构化点对点网络中的Flash数据传播

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.66

Antonis Papadimitriou, A. Delis

引用次数: 4

Scioto: A Framework for Global-View Task Parallelism 一个全局视图任务并行的框架

2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.44

James Dinan, S. Krishnamoorthy, D. B. Larkins, J. Nieplocha, P. Sadayappan

{"title":"Scioto: A Framework for Global-View Task Parallelism","authors":"James Dinan, S. Krishnamoorthy, D. B. Larkins, J. Nieplocha, P. Sadayappan","doi":"10.1109/ICPP.2008.44","DOIUrl":"https://doi.org/10.1109/ICPP.2008.44","url":null,"abstract":"We introduce Scioto, shared collections of task objects, a lightweight framework for providing task management on distributed memory machines under one-sided and global-view parallel programming models. Scioto provides locality aware dynamic load balancing and interoperates with MPI, ARMCI, and global arrays. Additionally, Scioto's task model and programming interface are compatible with many other existing parallel models including UPC, SHMEM, and CAF. Through task parallelism, the Scioto framework provides a solution for overcoming irregularity, load imbalance, and heterogeneity as well as dynamic mapping of computation onto emerging architectures. In this paper, we present the design and implementation of the Scioto framework and demonstrate its effectiveness on the unbalanced tree search (UTS) benchmark and two quantum chemistry codes: the closed shell self-consistent field (SCF) method and a sparse tensor contraction kernel extracted from a coupled cluster computation. We explore the efficiency and scalability of Scioto through these sample applications and demonstrate that is offers low overhead, achieves good performance on heterogeneous and multicore clusters, and scales to hundreds of processors.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123669450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 75