{"title":"海量多核处理器分层缓存模型中内存通信流量的评估","authors":"Sharifa Al Khanjari, W. Vanderbauwhede","doi":"10.1109/PDP.2016.30","DOIUrl":null,"url":null,"abstract":"The scaling of semiconductor technologies is leading to processors with increasing numbers of cores. A key enabler in manycore systems is the use of Networks-on-Chip (NoC) as a global communication mechanism. The adoption of NoCs in manycore systems requires a shift in focus from computation to communication, as communication is fast becoming the dominant factor in processor performance. Many researchers have focused on direct communication between cores in the NoC, however in a manycore processor the communication is actually between the cores and the memory hierarchy. In this work, we investigate the memory communication traffic of shared threads in a hierarchical cache architecture. We argue that the performance scalability for shared-memory applications in a hierarchical cache architecture for systems with thousands of processor cores depends on the distance between threads sharing memory in terms of the cache hierarchy (the \"memory distance\"). We present latency and throughput results comparing fat quadtree, concentrated mesh and mesh topologies as a function of the \"memory distance\" between the threads. Our results using the ITRS physical data for 2023 show that the model of thread placement and the distance of placing them significantly affects the NoC performance, and that scale-invariant topologies perform better than flat topologies.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Evaluation of the Memory Communication Traffic in a Hierarchical Cache Model for Massively-Manycore Processors\",\"authors\":\"Sharifa Al Khanjari, W. Vanderbauwhede\",\"doi\":\"10.1109/PDP.2016.30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The scaling of semiconductor technologies is leading to processors with increasing numbers of cores. A key enabler in manycore systems is the use of Networks-on-Chip (NoC) as a global communication mechanism. The adoption of NoCs in manycore systems requires a shift in focus from computation to communication, as communication is fast becoming the dominant factor in processor performance. Many researchers have focused on direct communication between cores in the NoC, however in a manycore processor the communication is actually between the cores and the memory hierarchy. In this work, we investigate the memory communication traffic of shared threads in a hierarchical cache architecture. We argue that the performance scalability for shared-memory applications in a hierarchical cache architecture for systems with thousands of processor cores depends on the distance between threads sharing memory in terms of the cache hierarchy (the \\\"memory distance\\\"). We present latency and throughput results comparing fat quadtree, concentrated mesh and mesh topologies as a function of the \\\"memory distance\\\" between the threads. Our results using the ITRS physical data for 2023 show that the model of thread placement and the distance of placing them significantly affects the NoC performance, and that scale-invariant topologies perform better than flat topologies.\",\"PeriodicalId\":192273,\"journal\":{\"name\":\"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDP.2016.30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2016.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluation of the Memory Communication Traffic in a Hierarchical Cache Model for Massively-Manycore Processors
The scaling of semiconductor technologies is leading to processors with increasing numbers of cores. A key enabler in manycore systems is the use of Networks-on-Chip (NoC) as a global communication mechanism. The adoption of NoCs in manycore systems requires a shift in focus from computation to communication, as communication is fast becoming the dominant factor in processor performance. Many researchers have focused on direct communication between cores in the NoC, however in a manycore processor the communication is actually between the cores and the memory hierarchy. In this work, we investigate the memory communication traffic of shared threads in a hierarchical cache architecture. We argue that the performance scalability for shared-memory applications in a hierarchical cache architecture for systems with thousands of processor cores depends on the distance between threads sharing memory in terms of the cache hierarchy (the "memory distance"). We present latency and throughput results comparing fat quadtree, concentrated mesh and mesh topologies as a function of the "memory distance" between the threads. Our results using the ITRS physical data for 2023 show that the model of thread placement and the distance of placing them significantly affects the NoC performance, and that scale-invariant topologies perform better than flat topologies.