Gang Shi, Mingchang Hu, Hongda Yin, Weiwu Hu, Zhimin Tang
{"title":"具有快速远程直接内存访问和消息传递的共享虚拟内存网络","authors":"Gang Shi, Mingchang Hu, Hongda Yin, Weiwu Hu, Zhimin Tang","doi":"10.1109/CLUSTR.2004.1392660","DOIUrl":null,"url":null,"abstract":"The communication overhead has become one of the bottlenecks of SVM (shared virtual memory). Many methods have been taken to improve the performance of SVM. However, these can't obtain the improvement as expected. In order to get further utility of communication hardware and reduce unnecessary overhead, a prototype with the ability of RDMA is designed and implemented in This work, which is named FRAMP (virtual memory based Fast Remote direct memory Access and Message Passing network). FRAMP includes the cross bar-based switch, the custom host network interface and the user-level communication protocol. All of these are tightly coupled and deliberately balanced. FRAMP achieves 3.7 s one-way latency and 6.0 s RDMA read latency on system driver level. FRAMP gets 5.6 s one-way latency and 2.0 s ping-ping latency and 125MB/S asymptotic bandwidth on user API level with multi-thread programming method. Remote memory read for 8 bytes and a page of 4096 bytes only takes 8.0 s and 39 s respectively on user level. The obtained bandwidth is close to the hardware limit of our experimental environment, which is based on 33MHz 32-bit PCI bus, and the use rate of PCI bus is 94%. The SVM performance on FRAMP network with pure message passing is very good, but the one using RDMA read to fetch fault pages is not so good.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A shared virtual memory network with fast remote direct memory access and message passing\",\"authors\":\"Gang Shi, Mingchang Hu, Hongda Yin, Weiwu Hu, Zhimin Tang\",\"doi\":\"10.1109/CLUSTR.2004.1392660\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The communication overhead has become one of the bottlenecks of SVM (shared virtual memory). Many methods have been taken to improve the performance of SVM. However, these can't obtain the improvement as expected. In order to get further utility of communication hardware and reduce unnecessary overhead, a prototype with the ability of RDMA is designed and implemented in This work, which is named FRAMP (virtual memory based Fast Remote direct memory Access and Message Passing network). FRAMP includes the cross bar-based switch, the custom host network interface and the user-level communication protocol. All of these are tightly coupled and deliberately balanced. FRAMP achieves 3.7 s one-way latency and 6.0 s RDMA read latency on system driver level. FRAMP gets 5.6 s one-way latency and 2.0 s ping-ping latency and 125MB/S asymptotic bandwidth on user API level with multi-thread programming method. Remote memory read for 8 bytes and a page of 4096 bytes only takes 8.0 s and 39 s respectively on user level. The obtained bandwidth is close to the hardware limit of our experimental environment, which is based on 33MHz 32-bit PCI bus, and the use rate of PCI bus is 94%. The SVM performance on FRAMP network with pure message passing is very good, but the one using RDMA read to fetch fault pages is not so good.\",\"PeriodicalId\":123512,\"journal\":{\"name\":\"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTR.2004.1392660\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2004.1392660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A shared virtual memory network with fast remote direct memory access and message passing
The communication overhead has become one of the bottlenecks of SVM (shared virtual memory). Many methods have been taken to improve the performance of SVM. However, these can't obtain the improvement as expected. In order to get further utility of communication hardware and reduce unnecessary overhead, a prototype with the ability of RDMA is designed and implemented in This work, which is named FRAMP (virtual memory based Fast Remote direct memory Access and Message Passing network). FRAMP includes the cross bar-based switch, the custom host network interface and the user-level communication protocol. All of these are tightly coupled and deliberately balanced. FRAMP achieves 3.7 s one-way latency and 6.0 s RDMA read latency on system driver level. FRAMP gets 5.6 s one-way latency and 2.0 s ping-ping latency and 125MB/S asymptotic bandwidth on user API level with multi-thread programming method. Remote memory read for 8 bytes and a page of 4096 bytes only takes 8.0 s and 39 s respectively on user level. The obtained bandwidth is close to the hardware limit of our experimental environment, which is based on 33MHz 32-bit PCI bus, and the use rate of PCI bus is 94%. The SVM performance on FRAMP network with pure message passing is very good, but the one using RDMA read to fetch fault pages is not so good.