2008 37th International Conference on Parallel Processing最新文献

筛选
英文 中文
TPTS: A Novel Framework for Very Fast Manycore Processor Architecture Simulation TPTS:一种快速多核处理器架构模拟的新框架
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.7
Sangyeun Cho, Socrates Demetriades, Shayne Evans, Lei Jin, Hyunjin Lee, Kiyeon Lee, Michael Moeng
{"title":"TPTS: A Novel Framework for Very Fast Manycore Processor Architecture Simulation","authors":"Sangyeun Cho, Socrates Demetriades, Shayne Evans, Lei Jin, Hyunjin Lee, Kiyeon Lee, Michael Moeng","doi":"10.1109/ICPP.2008.7","DOIUrl":"https://doi.org/10.1109/ICPP.2008.7","url":null,"abstract":"The slow speed of conventional execution-driven architecture simulators is a serious impediment to obtaining desirable research productivity. This paper proposes and evaluates a fast manycore processor simulation framework called two-phase trace-driven simulation (TPTS), which splits detailed timing simulation into a trace generation phase and a trace simulation phase. Much of the simulation overhead caused by uninteresting architectural events is only incurred once during the trace generation phase and can be omitted in the repeated trace-driven simulations. We design and implement tsim, an event-driven manycore processor simulator that models detailed memory hierarchy, interconnect, and coherence protocol models based on the proposed TPTS framework. By applying aggressive event filtering, tsim achieves an impressive simulation speed of 146 MIPS, when running 16-thread parallel applications.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"350 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122287971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Achieving Multi-Level Parallelism in the Filter-Labeled Stream Programming Model 在过滤器标记流编程模型中实现多级并行
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.72
George Teodoro, Daniel Fireman, Dorgival Olavo Guedes Neto, Wagner Meira Jr, R. Ferreira
{"title":"Achieving Multi-Level Parallelism in the Filter-Labeled Stream Programming Model","authors":"George Teodoro, Daniel Fireman, Dorgival Olavo Guedes Neto, Wagner Meira Jr, R. Ferreira","doi":"10.1109/ICPP.2008.72","DOIUrl":"https://doi.org/10.1109/ICPP.2008.72","url":null,"abstract":"New architectural trends in chip design resulted in machines with multiple processing units as well as efficient communication networks, leading to the wide availability of systems that provide multiple levels of parallelism, both inter- and intra-machine. Developing applications that efficiently make use of such systems is a challenge, specially for application-domain programmers. In this paper we present a new version of the Anthill programming environment that efficiently exploits multi-level parallelism and experimental results that demonstrate such efficiency. Anthill is based on the filter-stream model; in this model, applications are decomposed into a set of filters communicating through streams, which has already been shown to be efficient for expressing inter-machine parallelism. We replaced the filter run-time environment, originally process-oriented, with an event-oriented version. This new version allow programmers to efficiently express opportunities for parallelism within each compute node through a higher-level programming abstraction. We evaluated our solution on dual- and quad-core machines with two data mining applications: Eclat and KNN. Both had drops in execution time nearly proportional to the number of cores on a single machine. When using a cluster of dual-core machines, speed-ups were close to linear on the number of available cores for both applications, confirming event-oriented Anthill performs well both on the inter- and intra-machine parallelism levels.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122868467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Scalability Evaluation and Optimization of Multi-Core SIP Proxy Server 多核SIP代理服务器的可扩展性评估与优化
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.30
Jia Zou, Zhiyong Liang, Yiqi Dai
{"title":"Scalability Evaluation and Optimization of Multi-Core SIP Proxy Server","authors":"Jia Zou, Zhiyong Liang, Yiqi Dai","doi":"10.1109/ICPP.2008.30","DOIUrl":"https://doi.org/10.1109/ICPP.2008.30","url":null,"abstract":"The session initiation protocol (SIP) is one popular signaling protocol used in many collaborative applications like VoIP, instant messaging and presence. In this paper, we evaluate one well-known SIP proxy server (i.e. OpenSER) on two multi-core platforms: SUN Niagara and Intel Clovertown, which are installed with Solaris OS and Linux OS respectively. Through the evaluation, we identify three factors that determine the performance scalability of OpenSER server. One is inside the OSes: overhead from the coarse-grained locks used in the UDP socket layer. Others are specific to the multi-process programming model: 1. overhead caused by passing socket descriptors among processes; 2. overhead brought by sharing transaction objects among processes. To remedy these problems, we propose several incremental optimizations, including out-of-box dispatcher, light-weight connection dispatcher and dataset partition, and achieve significant improvements: for UDP and TCP transport, on SUN Niagara, speedup (ideal is 8) are improved from 1.5 to 5.8 and from 2.2 to 6.2, respectively; on Intel Clovertown, speedup (ideal is 8) are improved from 1.2 to 3.1 and from 2.6 to 4.8, respectively.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128310898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
On Modeling Fault Tolerance of Gossip-Based Reliable Multicast Protocols 基于gossip的可靠组播协议容错建模研究
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.10
Xiaopeng Fan, Jiannong Cao, Weigang Wu, M. Raynal
{"title":"On Modeling Fault Tolerance of Gossip-Based Reliable Multicast Protocols","authors":"Xiaopeng Fan, Jiannong Cao, Weigang Wu, M. Raynal","doi":"10.1109/ICPP.2008.10","DOIUrl":"https://doi.org/10.1109/ICPP.2008.10","url":null,"abstract":"Gossiping has been widely used for disseminating data in large scale networks. Existing works have mainly focused on the design of gossip-based protocols but few have been reported on developing models for analyzing the fault tolerance property of these protocols. In this paper, we propose a general gossiping algorithm and develop a mathematical model based on generalized random graphs for evaluating the reliability of gossiping, i.e., to what extent gossip-based protocols can tolerate node failures, yet guarantee the specified message delivery. We analytically derive the maximum ratio of failed nodes that can be tolerated without reducing the required degree of reliability. We also investigate the impact of the parameters, namely the fanout distribution and the non failed member ratio, on the protocol reliability. Simulations have been carried out to validate the effectiveness of our analytic model in terms of the reliability of gossiping and the success of gossiping. The results obtained can be used to guide the design of fault tolerant gossip-based protocols.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"54 91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123314423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Improving the Performance of Multithreaded Sparse Matrix-Vector Multiplication Using Index and Value Compression 利用索引和值压缩提高多线程稀疏矩阵向量乘法的性能
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.62
K. Kourtis, G. Goumas, N. Koziris
{"title":"Improving the Performance of Multithreaded Sparse Matrix-Vector Multiplication Using Index and Value Compression","authors":"K. Kourtis, G. Goumas, N. Koziris","doi":"10.1109/ICPP.2008.62","DOIUrl":"https://doi.org/10.1109/ICPP.2008.62","url":null,"abstract":"The sparse matrix-vector multiplication kernel exhibits limited potential for taking advantage of modern shared memory architectures due to its large memory bandwidth requirements. To decrease memory contention and improve the performance of the kernel we propose two compression schemes. The first, called CSR-DU, targets the reduction of the matrix structural data by applying coarse grain delta encoding for the column indices. The second scheme, called CSR-VI, targets the reduction of the numerical values using indirect indexing and can only be applied to matrices which contain a small number of unique values. Evaluation of both methods on a rich matrix set showed that they can significantly improve the performance of the multithreaded version of the kernel and achieve good scalability for large matrices.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124520964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Performance of HPC Middleware over InfiniBand WAN 高性能计算中间件在ib广域网上的性能研究
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.75
S. Narravula, H. Subramoni, P. Lai, R. Noronha, D. Panda
{"title":"Performance of HPC Middleware over InfiniBand WAN","authors":"S. Narravula, H. Subramoni, P. Lai, R. Noronha, D. Panda","doi":"10.1109/ICPP.2008.75","DOIUrl":"https://doi.org/10.1109/ICPP.2008.75","url":null,"abstract":"High performance interconnects such as InfiniBand (IB)have enabled large scale deployments of High Performance Computing (HPC) systems. High performance communication and IO middleware such as MPI and NFS over RDMA have also been redesigned to leverage the performance of these modern interconnects. With the advent of long haul InfiniBand (IB WAN), IB applications now have inter-cluster reaches. While this technology is intended to enable high performance network connectivity across WAN links,it is important to study and characterize the actual performance that the existing IB middleware achieve in these emerging IB WAN scenarios. In this paper, we study and analyze the performance characteristics of the following three HPC middleware: (i)IPoIB (IP traffic over IB), (ii) MPI and (iii) NFS over RDMA. We utilize the Obsidian IB WAN routers for inter-cluster connectivity. Our results show that many of the applications absorb smaller network delays fairly well. However, most approaches get severely impacted in high delay scenarios. Further, communication protocols need to be optimized in higher delay scenarios to improve the performance. In this paper, we propose several such optimizations to improve communication performance. Our experimental results show that techniques such as WAN-aware protocols, transferring data using large messages (message coalescing) and using parallel data streams can improve the communication performance (up to 50%) in high delay scenarios. Overall, these results demonstrate that IB WAN technologies can enable cluster-of-clusters architecture as a feasible platform for HPC systems.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125729633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Bounded LSH for Similarity Search in Peer-to-Peer File Systems 点对点文件系统相似性搜索的有界LSH
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.25
Yu Hua, Bin Xiao, D. Feng, Bo Yu
{"title":"Bounded LSH for Similarity Search in Peer-to-Peer File Systems","authors":"Yu Hua, Bin Xiao, D. Feng, Bo Yu","doi":"10.1109/ICPP.2008.25","DOIUrl":"https://doi.org/10.1109/ICPP.2008.25","url":null,"abstract":"Similarity search has been widely studied in peer-to-peer environments. In this paper, we propose the Bounded Locality Sensitive Hashing (Bounded LSH) method for similarity search in P2P file systems. Compared to the basic Locality Sensitive Hashing (LSH), Bounded LSH makes improvement on the space saving and quick query response in the similarity search, especially for high-dimensional data objects that exhibit non-uniform distribution property. We present simple and space-efficient Bounded-LSH to map non-uniform data space into load-balanced hash buckets that contain approximate number of objects. Load-balanced hash buckets in Bounded-LSH, in turn, require less number of hash tables while maintaining a high probability of returning the closest objects to requests. Our experiments based on synthetic and real-world datasets showed the feasibility, query and space efficiency of our proposed method.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130458789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Ocean-Atmosphere Modelization over the Grid 网格上的海洋-大气模型化
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.37
Y. Caniou, E. Caron, G. Charrier, Andréea Chis, F. Desprez, E. Maisonnave
{"title":"Ocean-Atmosphere Modelization over the Grid","authors":"Y. Caniou, E. Caron, G. Charrier, Andréea Chis, F. Desprez, E. Maisonnave","doi":"10.1109/ICPP.2008.37","DOIUrl":"https://doi.org/10.1109/ICPP.2008.37","url":null,"abstract":"In this paper, we tackle the problem of scheduling an Ocean-Atmosphere application used for climate prediction on the grid. An experiment is composed of several 1D-meshes of identical DAGs composed of parallel tasks. To obtain a good completion time, we divide groups of processors into sets each working on parallel tasks. The group sizes are chosen by computing the best makespan for several grouping possibilities. We improved this heuristic method by different means. The improvement yielding to the best makespan is the representation of the problem as an instance of the Knapsack problem. As this heuristic is firstly designed for homogeneous platforms, we present its adaptation to heterogeneous platforms. Simulations show improvements of the makespan up to 12%.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"244 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115853615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Flash Data Dissemination in Unstructured Peer-to-Peer Networks 非结构化点对点网络中的Flash数据传播
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.66
Antonis Papadimitriou, A. Delis
{"title":"Flash Data Dissemination in Unstructured Peer-to-Peer Networks","authors":"Antonis Papadimitriou, A. Delis","doi":"10.1109/ICPP.2008.66","DOIUrl":"https://doi.org/10.1109/ICPP.2008.66","url":null,"abstract":"The problem of flash data dissemination refers to spreading dynamically-created medium-sized data to all members of a large group of users. In this paper, we explore a solution to the problem of flash data dissemination in unstructured P2P networks and propose a gossip-based protocol, termed catalogue-gossip. Our protocol alleviates the shortcomings of prior gossip-based dissemination approaches through the introduction of an efficient catalogue exchange scheme that helps reduce unnecessary interactions among nodes in the unstructured network. We provide deterministic guarantees for the termination of the protocol and suggest optimizations concerning the order with which pieces of flash data are assembled at receiving peers. Experimental results show that catalogue-gossip is significantly more efficient than existing solutions when it comes to delivery of flash data.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"376 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116325827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Scioto: A Framework for Global-View Task Parallelism 一个全局视图任务并行的框架
2008 37th International Conference on Parallel Processing Pub Date : 2008-09-09 DOI: 10.1109/ICPP.2008.44
James Dinan, S. Krishnamoorthy, D. B. Larkins, J. Nieplocha, P. Sadayappan
{"title":"Scioto: A Framework for Global-View Task Parallelism","authors":"James Dinan, S. Krishnamoorthy, D. B. Larkins, J. Nieplocha, P. Sadayappan","doi":"10.1109/ICPP.2008.44","DOIUrl":"https://doi.org/10.1109/ICPP.2008.44","url":null,"abstract":"We introduce Scioto, shared collections of task objects, a lightweight framework for providing task management on distributed memory machines under one-sided and global-view parallel programming models. Scioto provides locality aware dynamic load balancing and interoperates with MPI, ARMCI, and global arrays. Additionally, Scioto's task model and programming interface are compatible with many other existing parallel models including UPC, SHMEM, and CAF. Through task parallelism, the Scioto framework provides a solution for overcoming irregularity, load imbalance, and heterogeneity as well as dynamic mapping of computation onto emerging architectures. In this paper, we present the design and implementation of the Scioto framework and demonstrate its effectiveness on the unbalanced tree search (UTS) benchmark and two quantum chemistry codes: the closed shell self-consistent field (SCF) method and a sparse tensor contraction kernel extracted from a coupled cluster computation. We explore the efficiency and scalability of Scioto through these sample applications and demonstrate that is offers low overhead, achieves good performance on heterogeneous and multicore clusters, and scales to hundreds of processors.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123669450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 75
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信