具有快速远程直接内存访问和消息传递的共享虚拟内存网络

Gang Shi, Mingchang Hu, Hongda Yin, Weiwu Hu, Zhimin Tang
{"title":"具有快速远程直接内存访问和消息传递的共享虚拟内存网络","authors":"Gang Shi, Mingchang Hu, Hongda Yin, Weiwu Hu, Zhimin Tang","doi":"10.1109/CLUSTR.2004.1392660","DOIUrl":null,"url":null,"abstract":"The communication overhead has become one of the bottlenecks of SVM (shared virtual memory). Many methods have been taken to improve the performance of SVM. However, these can't obtain the improvement as expected. In order to get further utility of communication hardware and reduce unnecessary overhead, a prototype with the ability of RDMA is designed and implemented in This work, which is named FRAMP (virtual memory based Fast Remote direct memory Access and Message Passing network). FRAMP includes the cross bar-based switch, the custom host network interface and the user-level communication protocol. All of these are tightly coupled and deliberately balanced. FRAMP achieves 3.7 s one-way latency and 6.0 s RDMA read latency on system driver level. FRAMP gets 5.6 s one-way latency and 2.0 s ping-ping latency and 125MB/S asymptotic bandwidth on user API level with multi-thread programming method. Remote memory read for 8 bytes and a page of 4096 bytes only takes 8.0 s and 39 s respectively on user level. The obtained bandwidth is close to the hardware limit of our experimental environment, which is based on 33MHz 32-bit PCI bus, and the use rate of PCI bus is 94%. The SVM performance on FRAMP network with pure message passing is very good, but the one using RDMA read to fetch fault pages is not so good.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A shared virtual memory network with fast remote direct memory access and message passing\",\"authors\":\"Gang Shi, Mingchang Hu, Hongda Yin, Weiwu Hu, Zhimin Tang\",\"doi\":\"10.1109/CLUSTR.2004.1392660\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The communication overhead has become one of the bottlenecks of SVM (shared virtual memory). Many methods have been taken to improve the performance of SVM. However, these can't obtain the improvement as expected. In order to get further utility of communication hardware and reduce unnecessary overhead, a prototype with the ability of RDMA is designed and implemented in This work, which is named FRAMP (virtual memory based Fast Remote direct memory Access and Message Passing network). FRAMP includes the cross bar-based switch, the custom host network interface and the user-level communication protocol. All of these are tightly coupled and deliberately balanced. FRAMP achieves 3.7 s one-way latency and 6.0 s RDMA read latency on system driver level. FRAMP gets 5.6 s one-way latency and 2.0 s ping-ping latency and 125MB/S asymptotic bandwidth on user API level with multi-thread programming method. Remote memory read for 8 bytes and a page of 4096 bytes only takes 8.0 s and 39 s respectively on user level. The obtained bandwidth is close to the hardware limit of our experimental environment, which is based on 33MHz 32-bit PCI bus, and the use rate of PCI bus is 94%. The SVM performance on FRAMP network with pure message passing is very good, but the one using RDMA read to fetch fault pages is not so good.\",\"PeriodicalId\":123512,\"journal\":{\"name\":\"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTR.2004.1392660\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2004.1392660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

通信开销已成为共享虚拟内存的瓶颈之一。为了提高支持向量机的性能,人们采用了许多方法。然而,这些并不能得到预期的改善。为了进一步利用通信硬件,减少不必要的开销,本文设计并实现了一个具有RDMA能力的原型网络,即基于虚拟内存的快速远程直接存储器访问和消息传递网络(FRAMP)。FRAMP包括基于交叉栏的交换机、自定义主机网络接口和用户级通信协议。所有这些都是紧密耦合和平衡的。FRAMP在系统驱动级上实现了3.7 s的单向延迟和6.0 s的RDMA读延迟。使用多线程编程方法,FRAMP在用户API级获得5.6 s的单向延迟和2.0 s的ping-ping延迟和125MB/ s的渐近带宽。读取8字节的远程内存和4096字节的页面在用户级别上分别只需要8.0秒和39秒。得到的带宽接近我们实验环境的硬件极限,我们的实验环境是基于33MHz的32位PCI总线,PCI总线的使用率为94%。纯消息传递的FRAMP网络支持向量机性能很好,而使用RDMA读取获取故障页的支持向量机性能不太好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A shared virtual memory network with fast remote direct memory access and message passing
The communication overhead has become one of the bottlenecks of SVM (shared virtual memory). Many methods have been taken to improve the performance of SVM. However, these can't obtain the improvement as expected. In order to get further utility of communication hardware and reduce unnecessary overhead, a prototype with the ability of RDMA is designed and implemented in This work, which is named FRAMP (virtual memory based Fast Remote direct memory Access and Message Passing network). FRAMP includes the cross bar-based switch, the custom host network interface and the user-level communication protocol. All of these are tightly coupled and deliberately balanced. FRAMP achieves 3.7 s one-way latency and 6.0 s RDMA read latency on system driver level. FRAMP gets 5.6 s one-way latency and 2.0 s ping-ping latency and 125MB/S asymptotic bandwidth on user API level with multi-thread programming method. Remote memory read for 8 bytes and a page of 4096 bytes only takes 8.0 s and 39 s respectively on user level. The obtained bandwidth is close to the hardware limit of our experimental environment, which is based on 33MHz 32-bit PCI bus, and the use rate of PCI bus is 94%. The SVM performance on FRAMP network with pure message passing is very good, but the one using RDMA read to fetch fault pages is not so good.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信