动态NUCA的线程进度感知块迁移

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) Pub Date : 2016-04-04 DOI:10.1109/PDP.2016.17

Jianhua Li, Xin An, Yiming Ouyang, Wei Wang

{"title":"动态NUCA的线程进度感知块迁移","authors":"Jianhua Li, Xin An, Yiming Ouyang, Wei Wang","doi":"10.1109/PDP.2016.17","DOIUrl":null,"url":null,"abstract":"Non-Uniform Cache Architecture (NUCA) is a viable solution for large capacity on-chip caches to manage the increasing wire delay. Dynamic NUCA divides the last-level cache (LLC) into smaller cache banks connected by on-chip network. D-NUCA yields good performance through migrating blocks within bank sets at runtime to harness data locality. Various works have well explored and studied D-NUCA, including block migration, mapping and searching. However, all of the previous D-NUCA design are thread-oblivious. Due to the interference on shared resources, threads often demonstrate unbalanced progress wherein the lagging threads with slow progress are more critical to overall performance. In this paper, we propose a novel D-NUCA design called Thread prOgress aware block Migration (TOM). TOM exploits the dynamic thread criticality information to control block migration. TOM aims at boosting the execution of critical threads through biased block migration. Experimental results show that TOM can effectively reduce the execution time of a set of PARSEC applications with less energy dissipation compared with previous D-NUCA design.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Thread Progress Aware Block Migration for Dynamic NUCA\",\"authors\":\"Jianhua Li, Xin An, Yiming Ouyang, Wei Wang\",\"doi\":\"10.1109/PDP.2016.17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Non-Uniform Cache Architecture (NUCA) is a viable solution for large capacity on-chip caches to manage the increasing wire delay. Dynamic NUCA divides the last-level cache (LLC) into smaller cache banks connected by on-chip network. D-NUCA yields good performance through migrating blocks within bank sets at runtime to harness data locality. Various works have well explored and studied D-NUCA, including block migration, mapping and searching. However, all of the previous D-NUCA design are thread-oblivious. Due to the interference on shared resources, threads often demonstrate unbalanced progress wherein the lagging threads with slow progress are more critical to overall performance. In this paper, we propose a novel D-NUCA design called Thread prOgress aware block Migration (TOM). TOM exploits the dynamic thread criticality information to control block migration. TOM aims at boosting the execution of critical threads through biased block migration. Experimental results show that TOM can effectively reduce the execution time of a set of PARSEC applications with less energy dissipation compared with previous D-NUCA design.\",\"PeriodicalId\":192273,\"journal\":{\"name\":\"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDP.2016.17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2016.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

非统一缓存架构(NUCA)是大容量片上缓存管理不断增加的线延迟的可行解决方案。动态NUCA将最后一级缓存(LLC)划分为通过片上网络连接的较小的缓存库。D-NUCA通过在运行时在银行集合内迁移块来利用数据局部性，从而产生良好的性能。各种工作已经很好地探索和研究了D-NUCA，包括块迁移，映射和搜索。然而，所有以前的D-NUCA设计都是线程无关的。由于对共享资源的干扰，线程经常表现出不平衡的进度，其中进度缓慢的滞后线程对整体性能更为关键。在本文中，我们提出了一种新的D-NUCA设计，称为线程进度感知块迁移(TOM)。TOM利用动态线程临界信息来控制块迁移。TOM旨在通过有偏块迁移来提高关键线程的执行。实验结果表明，与以前的D-NUCA设计相比，TOM可以有效地缩短一组PARSEC应用的执行时间，并且能量消耗更小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Thread Progress Aware Block Migration for Dynamic NUCA

Non-Uniform Cache Architecture (NUCA) is a viable solution for large capacity on-chip caches to manage the increasing wire delay. Dynamic NUCA divides the last-level cache (LLC) into smaller cache banks connected by on-chip network. D-NUCA yields good performance through migrating blocks within bank sets at runtime to harness data locality. Various works have well explored and studied D-NUCA, including block migration, mapping and searching. However, all of the previous D-NUCA design are thread-oblivious. Due to the interference on shared resources, threads often demonstrate unbalanced progress wherein the lagging threads with slow progress are more critical to overall performance. In this paper, we propose a novel D-NUCA design called Thread prOgress aware block Migration (TOM). TOM exploits the dynamic thread criticality information to control block migration. TOM aims at boosting the execution of critical threads through biased block migration. Experimental results show that TOM can effectively reduce the execution time of a set of PARSEC applications with less energy dissipation compared with previous D-NUCA design.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

自引率

0.00%

发文量