2008 IEEE International Conference on Cluster Computing最新文献

筛选
英文 中文
Scalable, high performance InfiniBand-attached SAN Volume Controller 可扩展,高性能无限带宽连接的SAN卷控制器
2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663807
D. S. Guthridge
{"title":"Scalable, high performance InfiniBand-attached SAN Volume Controller","authors":"D. S. Guthridge","doi":"10.1109/CLUSTR.2008.4663807","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663807","url":null,"abstract":"We have developed a highly reliable InfiniBand host attached block storage management and virtualization system that supports several off-the-shelf Fibre Channel RAID controllers on the back end. The system is based on the existing IBM TotalStorage SAN Volume Controller (SVC) product, and therefore offers performance, a wide array of storage virtualization features, and support for many existing storage controllers. We provide an overview of the driver design as well as performance results. Large read performance from SVC cache exceeds 3 GB/s in a minimal two-node cluster configuration.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128864973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving message passing over Ethernet with I/OAT copy offload in Open-MX 在Open-MX中通过I/OAT拷贝卸载改进以太网上的消息传递
2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663775
Brice Goglin
{"title":"Improving message passing over Ethernet with I/OAT copy offload in Open-MX","authors":"Brice Goglin","doi":"10.1109/CLUSTR.2008.4663775","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663775","url":null,"abstract":"Open-MX is a new message passing layer implemented on top of the generic Ethernet stack of the Linux kernel. Open-MX works on all Ethernet hardware, but it suffers from expensive memory copy requirements on the receiver side due to the hardwarepsilas inability to deposit messages directly in the target application buffers.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126508667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Divisible load scheduling with improved asymptotic optimality 具有改进渐近最优性的可分负荷调度
2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663779
R. Suda
{"title":"Divisible load scheduling with improved asymptotic optimality","authors":"R. Suda","doi":"10.1109/CLUSTR.2008.4663779","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663779","url":null,"abstract":"Divisible load model allows scheduling algorithms that give nearly optimal makespan with practical computational complexity. Beaumont et al. have shown that their algorithm produces a schedule whose makespan is within 1+O(1/radicT) times larger than the optimal solution when the total amount of tasks T scales up and the other conditions are fixed. We have proposed an extension of their algorithm for multiple masters with heterogeneous performance of processors but limited to uniform network performance. This paper analyzes the asymptotic performance of our algorithm, and shows that the asymptotic performance of our algorithm is either 1+O(1/radicT), 1+O(log T/T) or 1+O(1/T ), depending on the problem. For the latter two cases, our algorithm asymptotically outperforms the algorithm by Beaumont et al.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"238 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115662559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting data compression in collective I/O techniques 在集体I/O技术中利用数据压缩
2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663811
Rosa Filgueira, D. E. Singh, J. C. Pichel, J. Carretero
{"title":"Exploiting data compression in collective I/O techniques","authors":"Rosa Filgueira, D. E. Singh, J. C. Pichel, J. Carretero","doi":"10.1109/CLUSTR.2008.4663811","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663811","url":null,"abstract":"This paper presents Two-Phase Compressed I/O (TPC I/O,) an optimization of the Two-Phase collective I/O technique from ROMIO, the most popular MPI-IO implementation. In order to reduce network traffic, TPC I/O employs LZO algorithm to compress and decompress exchanged data in the inter-node communication operations. The compression algorithm has been fully implemented in the MPI collective technique, allowing to dynamically use (or not) compression. Compared with Two-Phase I/O, Two-Phase Compressed I/O obtains important improvements in the overall execution time for many of the considered scenarios.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"132 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114104795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Multistage switches are not crossbars: Effects of static routing in high-performance networks 多级交换机不是横杆:高性能网络中静态路由的影响
2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663762
T. Hoefler, Timo Schneider, A. Lumsdaine
{"title":"Multistage switches are not crossbars: Effects of static routing in high-performance networks","authors":"T. Hoefler, Timo Schneider, A. Lumsdaine","doi":"10.1109/CLUSTR.2008.4663762","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663762","url":null,"abstract":"Multistage interconnection networks based on central switches are ubiquitous in high-performance computing. Applications and communication libraries typically make use of such networks without consideration of the actual internal characteristics of the switch. However, application performance of these networks, particularly with respect to bisection bandwidth, does depend on communication paths through the switch. In this paper we discuss the limitations of the hardware definition of bisection bandwidth (capacity-based) and introduce a new metric: effective bisection bandwidth. We assess the effective bisection bandwidth of several large-scale production clusters by simulating artificial communication patterns on them. Networks with full bisection bandwidth typically provided effective bisection bandwidth in the range of 55-60%. Simulations with application-based patterns showed that the difference between effective and rated bisection bandwidth could impact overall application performance by up to 12%.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126057389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 110
DifferStore: A differentiated storage service in object-based storage system DifferStore:对象存储系统中的差别化存储服务
2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663770
Q. Wei, Zhixiang Li
{"title":"DifferStore: A differentiated storage service in object-based storage system","authors":"Q. Wei, Zhixiang Li","doi":"10.1109/CLUSTR.2008.4663770","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663770","url":null,"abstract":"This paper presents a differentiated storage service in object-based storage system, called DifferStore. To enable differentiated storage service for different applications in a single object-based storage platform, DifferStore utilizes a two-layer architecture to efficiently decouple upper-layer application specific storage policies and lower-layer application independent storage functions. For the lower application independent layer, this paper proposes a weight-based object I/O scheduler with differentiated scheduling policy for different request classes, and a versatile storage manager. The versatile storage manager implements differentiated storage policies in terms of disk layout and free space allocation, as well as an efficient object namespace management enabling directly access object on-disk data just with object ID. The DifferStore also provides ability for upper application specific layer to assign complex striping, placement, load-balancing policies and specific metadata structure of file. Experimental evaluation on our user space prototype demonstrates that the DifferStore can perform well under mixed workloads and satisfy requirements of different applications.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121153329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An OSD-based approach to managing directory operations in parallel file systems 一种在并行文件系统中管理目录操作的基于osd的方法
2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663769
N. Ali, A. Devulapalli, D. Dalessandro, P. Wyckoff, P. Sadayappan
{"title":"An OSD-based approach to managing directory operations in parallel file systems","authors":"N. Ali, A. Devulapalli, D. Dalessandro, P. Wyckoff, P. Sadayappan","doi":"10.1109/CLUSTR.2008.4663769","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663769","url":null,"abstract":"Distributed file systems that use multiple servers to store data in parallel are becoming commonplace. Much work has already gone into such systems to maximize data throughput. However, metadata management has historically been treated as an afterthought. In previous work we focused on improving metadata management techniques by placing file metadata along with data on object-based storage devices (OSDs). However, we did not investigate directory operations. This work looks at the possibility of designing directory structures directly on OSDs, without the need for intervening servers. In particular, the need for atomicity is a fundamental requirement that we explore in depth. Through performance results of benchmarks and applications we show the feasibility of using OSDs directly for metadata, including directory operations.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":" 19","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132124010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Continuous adaptation for high performance throughput computing across distributed clusters 持续适应跨分布式集群的高性能吞吐量计算
2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663797
E. Walker
{"title":"Continuous adaptation for high performance throughput computing across distributed clusters","authors":"E. Walker","doi":"10.1109/CLUSTR.2008.4663797","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663797","url":null,"abstract":"A job proxy is an abstraction for provisioning CPU resources. This paper proposes an adaptive algorithm for allocating job proxies to distributed host clusters with the objective of improving large-scale job ensemble throughput. Specifically, the paper proposes a decision metric for selecting appropriate pending job proxies for migration between host clusters, and a self-synchronizing Paxos-style distributed consensus algorithm for performing the migration of these selected job proxies. The algorithm is further described in the context of a concrete application, the MyCluster system, which implements a framework for submitting, managing and adapting job proxies across distributed high performance computing (HPC) host clusters. To date, the system has been used to provision many hundreds of thousands of CPUs for computational experiments requiring high throughput on HPC infrastructures like the NSF TeraGrid. Experimental evaluation of the proposed algorithm shows significant improvement in user job throughput: an average of 8% in simulation, and 15% in a real-world experiment.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"110 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129075506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Context-aware address translation for high performance SMP cluster system 面向高性能SMP集群系统的上下文感知地址转换
2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663784
Moon-Sang Lee, Joonwon Lee, S. Maeng
{"title":"Context-aware address translation for high performance SMP cluster system","authors":"Moon-Sang Lee, Joonwon Lee, S. Maeng","doi":"10.1109/CLUSTR.2008.4663784","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663784","url":null,"abstract":"User-level communication allows an application process to access the network interface directly. Bypassing the kernel requires that a user process accesses the network interface using its own virtual address which should be translated to a physical address. A small caching structure which is similar to the hardware TLB on the host processor has been used to cache the mappings between virtual and physical addresses on the network interface memory. In this study, we propose a new TLB architecture for the network interface. The proposed architecture splits an original caching structure into as many partitions as the number of processors on the SMP system and assigns a separate partition to each application process. In addition, the architecture becomes aware of user contexts and switches the content of caching structure in accordance with context switching. According to our experiments, our scheme achieves significant reduction in application execution time compared to the previous approach.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126535268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and implementation of an effective HyperTransport core in FPGA 一种有效的FPGA超传输核心的设计与实现
2008 IEEE International Conference on Cluster Computing Pub Date : 2008-10-31 DOI: 10.1109/CLUSTR.2008.4663805
Fei Chen, Hailiang Cheng, Xiaojun Yang, R. Liu
{"title":"Design and implementation of an effective HyperTransport core in FPGA","authors":"Fei Chen, Hailiang Cheng, Xiaojun Yang, R. Liu","doi":"10.1109/CLUSTR.2008.4663805","DOIUrl":"https://doi.org/10.1109/CLUSTR.2008.4663805","url":null,"abstract":"This paper presents a design and implementation of a HyperTransport (HT) core in lattice SCM FPGA which can run at 800 MHz DDR link frequency. An effective approach is also proposed to solve the ordering problem caused by different virtual channels which exists not only in HT but also PCI-e. HT is a high performance, low latency I/O standard which can be used directly to connect with some general-purpose processors, such as AMDpsilas Opteron processor family. HT interface on Opteron processor run at a maximum of 1 GHz frequency. However, most HT core in FPGA runs at a maximum of 500 MHz frequency which limits the performance of communication. In this paper, a 16 bit 800 MHz HT core is proposed to reduce the gap of ASIC and FPGA.","PeriodicalId":198768,"journal":{"name":"2008 IEEE International Conference on Cluster Computing","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128300470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信