{"title":"分布式共享最后一级缓存的多核pareto最优功率和缓存感知任务映射","authors":"Martin Rapp, A. Pathania, J. Henkel","doi":"10.1145/3218603.3218630","DOIUrl":null,"url":null,"abstract":"Two factors primarily affect performance of multi-threaded tasks on many-core processors with both shared and physically distributed Last-Level Cache (LLC): the power budget associated with a certain task mapping that aims to guarantee thermally safe operation and the non-uniform LLC access latency of threads running on different cores. Spatially distributing threads across the many-core increases the power budget, but unfortunately also increases the associated LLC latency. On the other side, mapping more threads to cores near the center of the many-core decreases the LLC latency, but unfortunately also decreases the power budget. Consequently, both metrics (LLC latency and power budget) cannot be simultaneously optimal, which leads to a Pareto-optimization that has formerly not been exploited. We are the first to present a run-time task mapping algorithm called PCMap that exploits this trade-off. Our approach results in up to 8.6% reduction in the average task response time accompanied by a reduction of up to 8.5% in the energy consumption compared to the state-of-the-art.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Pareto-Optimal Power- and Cache-Aware Task Mapping for Many-Cores with Distributed Shared Last-Level Cache\",\"authors\":\"Martin Rapp, A. Pathania, J. Henkel\",\"doi\":\"10.1145/3218603.3218630\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Two factors primarily affect performance of multi-threaded tasks on many-core processors with both shared and physically distributed Last-Level Cache (LLC): the power budget associated with a certain task mapping that aims to guarantee thermally safe operation and the non-uniform LLC access latency of threads running on different cores. Spatially distributing threads across the many-core increases the power budget, but unfortunately also increases the associated LLC latency. On the other side, mapping more threads to cores near the center of the many-core decreases the LLC latency, but unfortunately also decreases the power budget. Consequently, both metrics (LLC latency and power budget) cannot be simultaneously optimal, which leads to a Pareto-optimization that has formerly not been exploited. We are the first to present a run-time task mapping algorithm called PCMap that exploits this trade-off. Our approach results in up to 8.6% reduction in the average task response time accompanied by a reduction of up to 8.5% in the energy consumption compared to the state-of-the-art.\",\"PeriodicalId\":20456,\"journal\":{\"name\":\"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3218603.3218630\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3218603.3218630","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Pareto-Optimal Power- and Cache-Aware Task Mapping for Many-Cores with Distributed Shared Last-Level Cache
Two factors primarily affect performance of multi-threaded tasks on many-core processors with both shared and physically distributed Last-Level Cache (LLC): the power budget associated with a certain task mapping that aims to guarantee thermally safe operation and the non-uniform LLC access latency of threads running on different cores. Spatially distributing threads across the many-core increases the power budget, but unfortunately also increases the associated LLC latency. On the other side, mapping more threads to cores near the center of the many-core decreases the LLC latency, but unfortunately also decreases the power budget. Consequently, both metrics (LLC latency and power budget) cannot be simultaneously optimal, which leads to a Pareto-optimization that has formerly not been exploited. We are the first to present a run-time task mapping algorithm called PCMap that exploits this trade-off. Our approach results in up to 8.6% reduction in the average task response time accompanied by a reduction of up to 8.5% in the energy consumption compared to the state-of-the-art.