在多核心系统中通过流量优化降低能耗并提高性能

International Workshop on System Level Interconnect Prediction Pub Date : 2011-06-05 DOI:10.1109/SLIP.2011.6135429

George B. P. Bezerra, S. Forrest, P. Zarkesh-Ha

{"title":"在多核心系统中通过流量优化降低能耗并提高性能","authors":"George B. P. Bezerra, S. Forrest, P. Zarkesh-Ha","doi":"10.1109/SLIP.2011.6135429","DOIUrl":null,"url":null,"abstract":"As the number of cores on a die continues to increase, it is necessary to optimize the traffic patterns of applications in order to minimize power consumption and maximize performance. We present a new approach for traffic optimization in many-core systems, which targets communication locality and load-balancing. Our approach works by mapping memory blocks to physical locations on the chip that are close to cores that access them, and by enforcing load balance by limiting the number of blocks mapped to each location. Communication locality reduces the average distance traveled by packets, which minimizes power and increases performance. Load-balancing avoids hotspots and improves cache utilization. Rather than treating every application in the same way, our method uses available information to produce mappings that are specially tuned for individual applications. Simulations performed on a 64-core system show a reduction in dynamic energy consumption of up to 81.6% and of 45.5% on average, and gains in performance of up to 13.2% on scientific benchmarks.","PeriodicalId":189723,"journal":{"name":"International Workshop on System Level Interconnect Prediction","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Reducing energy and increasing performance with traffic optimization in many-core systems\",\"authors\":\"George B. P. Bezerra, S. Forrest, P. Zarkesh-Ha\",\"doi\":\"10.1109/SLIP.2011.6135429\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the number of cores on a die continues to increase, it is necessary to optimize the traffic patterns of applications in order to minimize power consumption and maximize performance. We present a new approach for traffic optimization in many-core systems, which targets communication locality and load-balancing. Our approach works by mapping memory blocks to physical locations on the chip that are close to cores that access them, and by enforcing load balance by limiting the number of blocks mapped to each location. Communication locality reduces the average distance traveled by packets, which minimizes power and increases performance. Load-balancing avoids hotspots and improves cache utilization. Rather than treating every application in the same way, our method uses available information to produce mappings that are specially tuned for individual applications. Simulations performed on a 64-core system show a reduction in dynamic energy consumption of up to 81.6% and of 45.5% on average, and gains in performance of up to 13.2% on scientific benchmarks.\",\"PeriodicalId\":189723,\"journal\":{\"name\":\"International Workshop on System Level Interconnect Prediction\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on System Level Interconnect Prediction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLIP.2011.6135429\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on System Level Interconnect Prediction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLIP.2011.6135429","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

随着芯片上的核心数量不断增加，为了最小化功耗和最大化性能，有必要优化应用程序的流量模式。提出了一种以通信局部性和负载均衡为目标的多核系统流量优化方法。我们的方法是将内存块映射到芯片上靠近访问它们的核心的物理位置，并通过限制映射到每个位置的块数量来强制负载平衡。通信局部性减少了数据包的平均传输距离，从而最大限度地降低了功耗并提高了性能。负载均衡避免了热点，提高了缓存利用率。我们的方法不是以相同的方式处理每个应用程序，而是使用可用信息生成针对单个应用程序专门调优的映射。在64核系统上进行的模拟显示，动态能耗降低高达81.6%，平均降低45.5%，在科学基准测试中，性能提高高达13.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reducing energy and increasing performance with traffic optimization in many-core systems

As the number of cores on a die continues to increase, it is necessary to optimize the traffic patterns of applications in order to minimize power consumption and maximize performance. We present a new approach for traffic optimization in many-core systems, which targets communication locality and load-balancing. Our approach works by mapping memory blocks to physical locations on the chip that are close to cores that access them, and by enforcing load balance by limiting the number of blocks mapped to each location. Communication locality reduces the average distance traveled by packets, which minimizes power and increases performance. Load-balancing avoids hotspots and improves cache utilization. Rather than treating every application in the same way, our method uses available information to produce mappings that are specially tuned for individual applications. Simulations performed on a 64-core system show a reduction in dynamic energy consumption of up to 81.6% and of 45.5% on average, and gains in performance of up to 13.2% on scientific benchmarks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Workshop on System Level Interconnect Prediction

自引率

0.00%

发文量