{"title":"加速应答注入,消除gpgpu的NoC瓶颈","authors":"Yunfan Li, Lizhong Chen","doi":"10.1109/IPDPS47924.2020.00013","DOIUrl":null,"url":null,"abstract":"The high level of parallelism in GPGPUs has resulted in significantly changed on-chip data traffic behaviors. This demands new research to identify and address the limiting factors of networks-on-chip (NoCs) in the context of GPGPUs. In this paper, we quantitatively analyze the performance of on-chip networks in GPGPUs, and address a serious NoC bottleneck where the reply data from memory controllers experience large contention when being injected to the reply network. To remove this reply injection bottleneck, we propose Accelerated Reply Injection (ARI), a very effective scheme that can supply a fast rate of data traffic from memory controllers to feed the reply injection points, and accelerates the consumption of the injected packets by quickly transferring the packets out of the injection points, thus increasing both supply and consumption of reply traffic injection. Simulation results on a wide range of benchmarks show that the proposed ARI reduces the data stall time in memory controllers by 67.8% on average, and increases IPC by more than 15.4% on average, with less than 1% area overhead.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"96 1","pages":"22-31"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accelerated Reply Injection for Removing NoC Bottleneck in GPGPUs\",\"authors\":\"Yunfan Li, Lizhong Chen\",\"doi\":\"10.1109/IPDPS47924.2020.00013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The high level of parallelism in GPGPUs has resulted in significantly changed on-chip data traffic behaviors. This demands new research to identify and address the limiting factors of networks-on-chip (NoCs) in the context of GPGPUs. In this paper, we quantitatively analyze the performance of on-chip networks in GPGPUs, and address a serious NoC bottleneck where the reply data from memory controllers experience large contention when being injected to the reply network. To remove this reply injection bottleneck, we propose Accelerated Reply Injection (ARI), a very effective scheme that can supply a fast rate of data traffic from memory controllers to feed the reply injection points, and accelerates the consumption of the injected packets by quickly transferring the packets out of the injection points, thus increasing both supply and consumption of reply traffic injection. Simulation results on a wide range of benchmarks show that the proposed ARI reduces the data stall time in memory controllers by 67.8% on average, and increases IPC by more than 15.4% on average, with less than 1% area overhead.\",\"PeriodicalId\":6805,\"journal\":{\"name\":\"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"96 1\",\"pages\":\"22-31\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS47924.2020.00013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Accelerated Reply Injection for Removing NoC Bottleneck in GPGPUs
The high level of parallelism in GPGPUs has resulted in significantly changed on-chip data traffic behaviors. This demands new research to identify and address the limiting factors of networks-on-chip (NoCs) in the context of GPGPUs. In this paper, we quantitatively analyze the performance of on-chip networks in GPGPUs, and address a serious NoC bottleneck where the reply data from memory controllers experience large contention when being injected to the reply network. To remove this reply injection bottleneck, we propose Accelerated Reply Injection (ARI), a very effective scheme that can supply a fast rate of data traffic from memory controllers to feed the reply injection points, and accelerates the consumption of the injected packets by quickly transferring the packets out of the injection points, thus increasing both supply and consumption of reply traffic injection. Simulation results on a wide range of benchmarks show that the proposed ARI reduces the data stall time in memory controllers by 67.8% on average, and increases IPC by more than 15.4% on average, with less than 1% area overhead.