2015 IEEE International Conference on Cluster Computing最新文献

筛选
英文 中文
Expressing Parallelism on Many-Core for Deterministic Discrete Ordinates Transport 确定性离散坐标传输的多核并行性表示
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.127
Tom Deakin, Simon McIntosh-Smith, W. Gaudin
{"title":"Expressing Parallelism on Many-Core for Deterministic Discrete Ordinates Transport","authors":"Tom Deakin, Simon McIntosh-Smith, W. Gaudin","doi":"10.1109/CLUSTER.2015.127","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.127","url":null,"abstract":"In this paper we demonstrate techniques for increasing the node-level parallelism of a deterministic discrete ordinates neutral particle transport algorithm on a structured mesh to exploit many-core technologies. Transport calculations form a large part of the computational workload of physical simulations and so good performance is vital for the simulations to complete in reasonable time. We will demonstrate our approach utilizing the SNAP mini-app, which gives a simplified implementation of the full transport algorithm but remains similar enough to the real algorithm to act as a useful proxy for research purposes. We present an OpenCL implementation of our improved algorithm which demonstrates a speedup of up to 2.5x the transport sweep performance on a many-core GPGPU device compared to a state-of-the-art multi-core node, the first time this scale of speedup has been achieved for algorithms of this class.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130058191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The Cost of Synchronizing Imbalanced Processes in Message Passing Systems 消息传递系统中同步不平衡进程的代价
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.63
I. Peng, S. Markidis, E. Laure
{"title":"The Cost of Synchronizing Imbalanced Processes in Message Passing Systems","authors":"I. Peng, S. Markidis, E. Laure","doi":"10.1109/CLUSTER.2015.63","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.63","url":null,"abstract":"Synchronization in message passing systems is achieved by communication among processes. System and architectural noise and different workloads cause processes to be imbalanced and to reach synchronization points at different time. Thus, both communication and imbalance impact the synchronization performance. In this paper, we study the algorithmic properties that allow the communication in synchronization to absorb the initial imbalance among processes. We quantify the imbalance absorption properties of different barrier algorithms using a LogP Monte Carlo simulator. We found that linear and f-way tournament barriers can absorb up to 95% of random exponential imbalance with the standard deviation equal to the communication time for one message. Dissemination, butterfly and pairwise exchange barriers, on the other hand, do not absorb imbalance but can effectively bound the post-barrier imbalance. We identify that synchronization transits from communication-dominated to imbalance-dominated when the standard deviation of imbalance distribution is more than twice the communication time for one message. In our study, f-way tournament barriers provided the best imbalance absorption rate and convenient communication time.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123525958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Distributed-Memory Algorithms for Maximal Cardinality Matching Using Matrix Algebra 基于矩阵代数的最大基数匹配分布式存储算法
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.62
A. Azad, A. Buluç
{"title":"Distributed-Memory Algorithms for Maximal Cardinality Matching Using Matrix Algebra","authors":"A. Azad, A. Buluç","doi":"10.1109/CLUSTER.2015.62","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.62","url":null,"abstract":"We design and implement distributed-memory parallel algorithms for computing maximal cardinality matching in a bipartite graph. Relying on matrix algebra building blocks, our algorithms expose a higher degree of parallelism on distributed-memory platforms than existing graph-based algorithms. In contrast to existing parallel algorithms, empirical approximation ratios of the new algorithms are insensitive to concurrency and stay relatively constant with increasing processor counts. On real instances, our algorithms achieve up to 300x speedup on 1024 cores of a Cray XC30 supercomputer. Even higher speedups are obtained on larger synthetically generated graphs where our algorithms show good scaling on up to 16,384 processors.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"65 38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125171554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Ensuring Data Durability with Increasingly Interdependent Content 通过日益相互依赖的内容确保数据的持久性
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.33
Veronica Estrada Galinanes, P. Felber
{"title":"Ensuring Data Durability with Increasingly Interdependent Content","authors":"Veronica Estrada Galinanes, P. Felber","doi":"10.1109/CLUSTER.2015.33","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.33","url":null,"abstract":"Data entanglement is a novel approach to generate and propagate redundancy across multiple disk nodes in a fault-tolerant data store. In this paper, we analyse and evaluate helical entanglement codes (HEC), an XOR-based erasure coding algorithm that constructs long sequences of entangled data using incoming data and stored parities. The robust topology guarantees low complexity and a greater resilience to failures than previous codes mentioned in the literature, however, the code pattern requires a minimum fixed amount of storage overhead. A unique characteristic of HEC is that fault tolerance depends on the number of distinct helical strands (p), a parameter that could be changed on the fly and does not add significantly more storage. A p-HEC setting can tolerate arbitrary 5+p failures. Decoding has a low reconstruction cost and good locality. Besides, a deep repair mechanism exploits the available global parities. We perform experiments to compare the repairability of HEC with other codes and present analytical results of its reliability.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130260822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Detecting Thread-Safety Violations in Hybrid OpenMP/MPI Programs 在混合OpenMP/MPI程序中检测线程安全违规
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.70
Hongyi Ma, Liqiang Wang, K. Krishnamoorthy
{"title":"Detecting Thread-Safety Violations in Hybrid OpenMP/MPI Programs","authors":"Hongyi Ma, Liqiang Wang, K. Krishnamoorthy","doi":"10.1109/CLUSTER.2015.70","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.70","url":null,"abstract":"We propose an approach by integrating static and dynamic program analyses to detect thread-safety violations in hybrid MPI/OpenMP programs. We innovatively transform the thread-safety violation problems to race conditions problems. In our approach, the static analysis identifies a list of MPI calls related to thread-safety violations, then replaces them with our own MPI wrappers, which involve accesses to some specific shared variables. The static analysis avoids instrumenting unrelated code, which significantly reduces runtime overhead. In the dynamic analysis, both happen-before and lockset-based race detection algorithms are used to detect races on these aforementioned shared variables. By detecting races, we can identify thread-safety violations according to their specifications. Our experimental evaluation over real-world applications shows that our approach is both accurate and efficient.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130874746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Scaling Data Intensive Physics Applications to 10k Cores on Non-dedicated Clusters with Lobster 在非专用集群上使用Lobster将数据密集型物理应用扩展到10k核
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.53
A. Woodard, M. Wolf, C. Müller, N. Valls, Benjamín Tovar, P. Donnelly, Peter Ivie, K. H. Anampa, P. Brenner, D. Thain, K. Lannon, M. Hildreth
{"title":"Scaling Data Intensive Physics Applications to 10k Cores on Non-dedicated Clusters with Lobster","authors":"A. Woodard, M. Wolf, C. Müller, N. Valls, Benjamín Tovar, P. Donnelly, Peter Ivie, K. H. Anampa, P. Brenner, D. Thain, K. Lannon, M. Hildreth","doi":"10.1109/CLUSTER.2015.53","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.53","url":null,"abstract":"The high energy physics (HEP) community relies upon a global network of computing and data centers to analyze data produced by multiple experiments at the Large Hadron Collider (LHC). However, this global network does not satisfy all research needs. Ambitious researchers often wish to harness computing resources that are not integrated into the global network, including private clusters, commercial clouds, and other production grids. To enable these use cases, we have constructed Lobster, a system for deploying data intensive high throughput applications on non-dedicated clusters. This requires solving multiple problems related to non-dedicated resources, including work decomposition, software delivery, concurrency management, data access, data merging, and performance troubleshooting. With these techniques, we demonstrate Lobster running effectively on 10k cores, producing throughput at a level comparable with some of the largest dedicated clusters in the LHC infrastructure.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"228 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122500556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Can Cloud Service Get His Family? A Step Towards Service Family Detecting 云服务能得到他的家人吗?迈向服务家族检测的一步
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.80
Xinkui Zhao, Jianwei Yin, Chen Zhi, Pengxiang Lin, Zuoning Chen
{"title":"Can Cloud Service Get His Family? A Step Towards Service Family Detecting","authors":"Xinkui Zhao, Jianwei Yin, Chen Zhi, Pengxiang Lin, Zuoning Chen","doi":"10.1109/CLUSTER.2015.80","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.80","url":null,"abstract":"In cloud computing environment, an application is always composed of several service components. A collection of service components is called a service family, and we name the cloud service components as service family members. In this paper, we propose a solution named Icebreaker to assemble service components belonging to the same application without sniffing tenants' privacy. Icebreaker characterizes each service component with basic resource consuming information and proposes a new distance calculating algorithm named iEntropy to distinct service components. We adaptively adopt Affinity Propagation (AP) clustering algorithm and maximum Silhouette index to identify the number of service family and assemble the service family members. Experiments are conducted on RUBiS, Hadoop and ApacheBench clusters with 169 VMs. Evaluation results show that Icebreaker can get 96.45% accuracy.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122720451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RDMA-Based Direct Transfer of File Data to Remote Page Cache 基于rdma的文件数据直接传输到远程页面缓存
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.40
Shin Sasaki, Kazushi Takahashi, Y. Oyama, O. Tatebe
{"title":"RDMA-Based Direct Transfer of File Data to Remote Page Cache","authors":"Shin Sasaki, Kazushi Takahashi, Y. Oyama, O. Tatebe","doi":"10.1109/CLUSTER.2015.40","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.40","url":null,"abstract":"The performance of a distributed file system significantly affects data-intensive applications that frequently execute I/O operations on large amounts of data. Although many modern distributed file systems are geared to provide highly efficient I/O performance, their operations are nonetheless affected by runtime overhead in data transfer between client nodes and I/O servers. A large part of the overhead is caused by memory copies executed by the client interface using the FUSE framework or a special kernel module. In this paper, we propose a method based on InfiniBand RDMA that improves data transfer performance between client and server in a distributed file system. The major characteristic of the method is that it transfers file data directly from a server's memory to the page cache of a client node. The method minimizes memory copies that are otherwise executed in the client interface or the operating system kernel. We implemented the proposed method in the Gfarm distributed file system and tested it using I/O benchmark software and real applications. The experimental results showed that our method effected a performance improvement of up to 78.4% and 256.0% in sequential and random file reads, respectively, and an improvement of up to 6.3% in data-intensive applications.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123909547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
VEF Traces: A Framework for Modelling MPI Traffic in Interconnection Network Simulators VEF跟踪:互连网络模拟器中MPI流量建模的框架
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.141
Francisco J. Andújar, Juan A. Villar, J. L. Sánchez, F. J. Alfaro, J. Escudero-Sahuquillo
{"title":"VEF Traces: A Framework for Modelling MPI Traffic in Interconnection Network Simulators","authors":"Francisco J. Andújar, Juan A. Villar, J. L. Sánchez, F. J. Alfaro, J. Escudero-Sahuquillo","doi":"10.1109/CLUSTER.2015.141","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.141","url":null,"abstract":"Simulation is often used to evaluate the behaviour and measure the performance of computing systems. Specifically, in high-performance interconnection networks, the simulation has been extensively considered to verify the behaviour of the network itself and to evaluate its performance. In this context, network simulation must be fed with network traffic, also referred to as network workload, whose nature has been traditionally synthetic. These workloads can be used for the purpose of driving studies on network performance, but often such workloads are not accurate enough if a realistic evaluation is pursued. For this reason, other non-synthetic workloads have gained popularity over last decades since they are best to capture the realistic behaviour of existing applications. In this paper, we present the VEF traces framework, a self-related trace model, and all their associated tools. The main novelty of this framework is that, unlike existing ones, it does not provide a network simulation framework, but only offers an MPI task simulation framework, which allows one to use the MPI-based network traffic by any third-party network simulator, since this framework does not depend on any specific simulation platform.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125067840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Performance Evaluation of Unstructured Mesh Physics on Advanced Architectures 非结构化网格物理在高级体系结构上的性能评估
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.126
C. Ferenbaugh
{"title":"Performance Evaluation of Unstructured Mesh Physics on Advanced Architectures","authors":"C. Ferenbaugh","doi":"10.1109/CLUSTER.2015.126","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.126","url":null,"abstract":"Unstructured mesh physics codes tend to exhibit different performance characteristics than other types of codes such as structured mesh or particle codes, due to their heavy use of indirection arrays and their irregular memory access patterns. For this reason unstructured mesh mini-apps are needed, alongside other types of mini-apps, to evaluate new architectures and hardware features. This paper uses one such mini-app, PENNANT, to investigate performance trends on architectures such as the Intel Xeon Phi, IBM BlueGene/Q, and NVIDIA K40 GPU. We present basic results comparing the performance of these platforms to each other and to traditional multicore CPUs. We also study the usefulness for unstructured codes of various hardware features such as hardware threading, advanced vector instructions, and fast atomic operations.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123668758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信