Motohiko Matsuda, T. Kudoh, Hiroshi Tazuka, Y. Ishikawa
{"title":"为MPI通信模型设计并实现了一种异步通信机制","authors":"Motohiko Matsuda, T. Kudoh, Hiroshi Tazuka, Y. Ishikawa","doi":"10.1109/CLUSTR.2004.1392597","DOIUrl":null,"url":null,"abstract":"Many implementations of an MPI communication library are realized on top of the socket interface which is based on connection-oriented stream communication. This work addresses a mismatch between the MPI communication model and the socket interface. In order to overcome a mismatch and implement an efficient MPI library for large-scale commodity-based clusters, a new communication mechanism, called 02G, is designed and implemented. O2G integrates receive queue management of MPI into a TCP/IP protocol handler, without modifying the protocol stacks. Received data is extracted from the TCP receive buffer and copied into the user space within the TCP/IP protocol handler invoked by interrupts. It totally avoids polling of sockets and reduces system call overhead, which becomes dominant in large-scale clusters. In addition, its immediate and asynchronous receive operation avoids message flow disruption due to a shortage of capacity in the receive buffer, and keeps the bandwidth high. An evaluation using the NAS Parallel Benchmarks shows that 02G made an MPI implementation up to 30 percent faster than the original one. An evaluation on bandwidth also shows that 02G made an MPI implementation independent of the number of connections, while an implementation with sockets was greatly affected by the number of connections.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"The design and implementation of an asynchronous communication mechanism for the MPI communication model\",\"authors\":\"Motohiko Matsuda, T. Kudoh, Hiroshi Tazuka, Y. Ishikawa\",\"doi\":\"10.1109/CLUSTR.2004.1392597\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many implementations of an MPI communication library are realized on top of the socket interface which is based on connection-oriented stream communication. This work addresses a mismatch between the MPI communication model and the socket interface. In order to overcome a mismatch and implement an efficient MPI library for large-scale commodity-based clusters, a new communication mechanism, called 02G, is designed and implemented. O2G integrates receive queue management of MPI into a TCP/IP protocol handler, without modifying the protocol stacks. Received data is extracted from the TCP receive buffer and copied into the user space within the TCP/IP protocol handler invoked by interrupts. It totally avoids polling of sockets and reduces system call overhead, which becomes dominant in large-scale clusters. In addition, its immediate and asynchronous receive operation avoids message flow disruption due to a shortage of capacity in the receive buffer, and keeps the bandwidth high. An evaluation using the NAS Parallel Benchmarks shows that 02G made an MPI implementation up to 30 percent faster than the original one. An evaluation on bandwidth also shows that 02G made an MPI implementation independent of the number of connections, while an implementation with sockets was greatly affected by the number of connections.\",\"PeriodicalId\":123512,\"journal\":{\"name\":\"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTR.2004.1392597\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2004.1392597","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The design and implementation of an asynchronous communication mechanism for the MPI communication model
Many implementations of an MPI communication library are realized on top of the socket interface which is based on connection-oriented stream communication. This work addresses a mismatch between the MPI communication model and the socket interface. In order to overcome a mismatch and implement an efficient MPI library for large-scale commodity-based clusters, a new communication mechanism, called 02G, is designed and implemented. O2G integrates receive queue management of MPI into a TCP/IP protocol handler, without modifying the protocol stacks. Received data is extracted from the TCP receive buffer and copied into the user space within the TCP/IP protocol handler invoked by interrupts. It totally avoids polling of sockets and reduces system call overhead, which becomes dominant in large-scale clusters. In addition, its immediate and asynchronous receive operation avoids message flow disruption due to a shortage of capacity in the receive buffer, and keeps the bandwidth high. An evaluation using the NAS Parallel Benchmarks shows that 02G made an MPI implementation up to 30 percent faster than the original one. An evaluation on bandwidth also shows that 02G made an MPI implementation independent of the number of connections, while an implementation with sockets was greatly affected by the number of connections.