{"title":"多核系统上接近I/O总线带宽的TCP/IP性能:10千兆以太网与多端口千兆以太网","authors":"Hyun-Wook Jin, Yeon-Ji Yun, Hye-Churn Jang","doi":"10.1109/ICPP-W.2008.33","DOIUrl":null,"url":null,"abstract":"With significant advances in network interfaces, I/O bus, and processor architecture of end node, innovative approaches are required to achieve high network bandwidth by fully utilizing available system resources. The issues related can be summarized into two: (i) Utilizing I/O bus bandwidth for high bandwidth network connection and (ii) Utilizing multiple cores for high packet processing throughput. In this paper, we conduct several experiments on a multi-core system with 10 GigE and multi-port 1 GigE network interfaces. We aim to show the impact of system configurations on the network performance and compare the performance of two different network interfaces. The experimental results show that, with the proper interrupt affinity configurations, the multi-port 1 GigE can achieve comparable bandwidth to 10 GigE. The peak bandwidth achieved by the multi-port 1 GigE is 6.7 Gbps, which is more than 80% of the theoretical maximum I/O bus bandwidth on the experimental system. We, however, also show that the multi-port 1 GigE can consume much more processor resource than 10 GigE. More importantly, we reveal that processing the packets on many cores can result in more resource consumption without much benefit. This can be because of locking overhead between softirqs running on different cores and lower cache efficiency. We show that the more tuning on the configuration cannot overcome this side effect.","PeriodicalId":231042,"journal":{"name":"2008 International Conference on Parallel Processing - Workshops","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"TCP/IP Performance Near I/O Bus Bandwidth on Multi-Core Systems: 10-Gigabit Ethernet vs. Multi-Port Gigabit Ethernet\",\"authors\":\"Hyun-Wook Jin, Yeon-Ji Yun, Hye-Churn Jang\",\"doi\":\"10.1109/ICPP-W.2008.33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With significant advances in network interfaces, I/O bus, and processor architecture of end node, innovative approaches are required to achieve high network bandwidth by fully utilizing available system resources. The issues related can be summarized into two: (i) Utilizing I/O bus bandwidth for high bandwidth network connection and (ii) Utilizing multiple cores for high packet processing throughput. In this paper, we conduct several experiments on a multi-core system with 10 GigE and multi-port 1 GigE network interfaces. We aim to show the impact of system configurations on the network performance and compare the performance of two different network interfaces. The experimental results show that, with the proper interrupt affinity configurations, the multi-port 1 GigE can achieve comparable bandwidth to 10 GigE. The peak bandwidth achieved by the multi-port 1 GigE is 6.7 Gbps, which is more than 80% of the theoretical maximum I/O bus bandwidth on the experimental system. We, however, also show that the multi-port 1 GigE can consume much more processor resource than 10 GigE. More importantly, we reveal that processing the packets on many cores can result in more resource consumption without much benefit. This can be because of locking overhead between softirqs running on different cores and lower cache efficiency. We show that the more tuning on the configuration cannot overcome this side effect.\",\"PeriodicalId\":231042,\"journal\":{\"name\":\"2008 International Conference on Parallel Processing - Workshops\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 International Conference on Parallel Processing - Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP-W.2008.33\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Conference on Parallel Processing - Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP-W.2008.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
TCP/IP Performance Near I/O Bus Bandwidth on Multi-Core Systems: 10-Gigabit Ethernet vs. Multi-Port Gigabit Ethernet
With significant advances in network interfaces, I/O bus, and processor architecture of end node, innovative approaches are required to achieve high network bandwidth by fully utilizing available system resources. The issues related can be summarized into two: (i) Utilizing I/O bus bandwidth for high bandwidth network connection and (ii) Utilizing multiple cores for high packet processing throughput. In this paper, we conduct several experiments on a multi-core system with 10 GigE and multi-port 1 GigE network interfaces. We aim to show the impact of system configurations on the network performance and compare the performance of two different network interfaces. The experimental results show that, with the proper interrupt affinity configurations, the multi-port 1 GigE can achieve comparable bandwidth to 10 GigE. The peak bandwidth achieved by the multi-port 1 GigE is 6.7 Gbps, which is more than 80% of the theoretical maximum I/O bus bandwidth on the experimental system. We, however, also show that the multi-port 1 GigE can consume much more processor resource than 10 GigE. More importantly, we reveal that processing the packets on many cores can result in more resource consumption without much benefit. This can be because of locking overhead between softirqs running on different cores and lower cache efficiency. We show that the more tuning on the configuration cannot overcome this side effect.