{"title":"在基于noc的多核中实现高效的动态数据放置","authors":"Qingchuan Shi, Farrukh Hijaz, O. Khan","doi":"10.1109/ICCD.2013.6657067","DOIUrl":null,"url":null,"abstract":"Next generation multicores will process massive data with significant sharing. Since future processors will also be inherently limited by the off-chip bandwidth, the on-chip data management is emerging as a first-order design constraint. On-chip memory latency increases as more cores are added since the diameter of most on-chip networks increases with the number of cores. We observe that a large fraction of on-chip traffic originates from communication between the cores to maintain cache coherence. Motivated by these observations, we propose a novel on-chip data placement mechanism that optimizes shared data placement by minimizing the distance of data from the requesting cores (improve locality) while paying attention to load balancing network contention and the utilization of percore cache capacity. Using simulations of a 64-core multicore, we show that our proposal outperforms state-of-the-art static and dynamic data placement mechanisms by an average of 5.5% and 8.5% respectively.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Towards efficient dynamic data placement in NoC-based multicores\",\"authors\":\"Qingchuan Shi, Farrukh Hijaz, O. Khan\",\"doi\":\"10.1109/ICCD.2013.6657067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Next generation multicores will process massive data with significant sharing. Since future processors will also be inherently limited by the off-chip bandwidth, the on-chip data management is emerging as a first-order design constraint. On-chip memory latency increases as more cores are added since the diameter of most on-chip networks increases with the number of cores. We observe that a large fraction of on-chip traffic originates from communication between the cores to maintain cache coherence. Motivated by these observations, we propose a novel on-chip data placement mechanism that optimizes shared data placement by minimizing the distance of data from the requesting cores (improve locality) while paying attention to load balancing network contention and the utilization of percore cache capacity. Using simulations of a 64-core multicore, we show that our proposal outperforms state-of-the-art static and dynamic data placement mechanisms by an average of 5.5% and 8.5% respectively.\",\"PeriodicalId\":398811,\"journal\":{\"name\":\"2013 IEEE 31st International Conference on Computer Design (ICCD)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 31st International Conference on Computer Design (ICCD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2013.6657067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 31st International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2013.6657067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards efficient dynamic data placement in NoC-based multicores
Next generation multicores will process massive data with significant sharing. Since future processors will also be inherently limited by the off-chip bandwidth, the on-chip data management is emerging as a first-order design constraint. On-chip memory latency increases as more cores are added since the diameter of most on-chip networks increases with the number of cores. We observe that a large fraction of on-chip traffic originates from communication between the cores to maintain cache coherence. Motivated by these observations, we propose a novel on-chip data placement mechanism that optimizes shared data placement by minimizing the distance of data from the requesting cores (improve locality) while paying attention to load balancing network contention and the utilization of percore cache capacity. Using simulations of a 64-core multicore, we show that our proposal outperforms state-of-the-art static and dynamic data placement mechanisms by an average of 5.5% and 8.5% respectively.