在特定于应用程序的多线程嵌入式处理器中实现高效的硬件级线程同步

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI:10.1109/ICCD.2015.7357119

M. Wickramasinghe, Hui Guo

{"title":"在特定于应用程序的多线程嵌入式处理器中实现高效的硬件级线程同步","authors":"M. Wickramasinghe, Hui Guo","doi":"10.1109/ICCD.2015.7357119","DOIUrl":null,"url":null,"abstract":"Multi-threaded processors interleave the execution of several threads to reduce processor stalling time. Instruction cache misses usually account for a significant fraction of the overall stalling time due to frequent instruction fetches. Apart from incurring extended execution time (hence its direct impact on energy consumption), cache misses also lead to indirect power overheads and increased thread switching due to resulting main memory accesses. Therefore, minimizing instruction cache misses is important especially in designing application specific embedded processors that tend to be compact in size and consume low power. This paper aims to reduce instruction cache misses in a single pipeline processor for applications that offer embarrassing parallelism and enable the same code to be executed by a number of independent threads on different data sets. Such a design can be used as a building block processor for large multicomputer systems. We propose a micro-architectural level multithreading control design, which synchronizes the thread execution to allow cached instructions to be maximally reused by all threads. Our experiments show that our design not only increases the pipeline performance but also reduces the memory access frequency, hence effectively achieving high energy efficiency.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Effective hardware-level thread synchronization for high performance and power efficiency in application specific multi-threaded embedded processors\",\"authors\":\"M. Wickramasinghe, Hui Guo\",\"doi\":\"10.1109/ICCD.2015.7357119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-threaded processors interleave the execution of several threads to reduce processor stalling time. Instruction cache misses usually account for a significant fraction of the overall stalling time due to frequent instruction fetches. Apart from incurring extended execution time (hence its direct impact on energy consumption), cache misses also lead to indirect power overheads and increased thread switching due to resulting main memory accesses. Therefore, minimizing instruction cache misses is important especially in designing application specific embedded processors that tend to be compact in size and consume low power. This paper aims to reduce instruction cache misses in a single pipeline processor for applications that offer embarrassing parallelism and enable the same code to be executed by a number of independent threads on different data sets. Such a design can be used as a building block processor for large multicomputer systems. We propose a micro-architectural level multithreading control design, which synchronizes the thread execution to allow cached instructions to be maximally reused by all threads. Our experiments show that our design not only increases the pipeline performance but also reduces the memory access frequency, hence effectively achieving high energy efficiency.\",\"PeriodicalId\":129506,\"journal\":{\"name\":\"2015 33rd IEEE International Conference on Computer Design (ICCD)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 33rd IEEE International Conference on Computer Design (ICCD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2015.7357119\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 33rd IEEE International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2015.7357119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

多线程处理器交错执行多个线程以减少处理器的停机时间。指令缓存丢失通常占总延迟时间的很大一部分，这是由于频繁获取指令造成的。除了导致执行时间延长(因此对能耗有直接影响)之外，缓存丢失还会导致间接的电源开销和由于主内存访问而增加的线程切换。因此，最小化指令缓存丢失是非常重要的，特别是在设计特定于应用程序的嵌入式处理器时，这种处理器往往尺寸紧凑，功耗低。本文旨在减少单个流水线处理器中指令缓存丢失的情况，这些应用程序提供了令人尴尬的并行性，并使相同的代码可以由许多独立的线程在不同的数据集上执行。这种设计可以用作大型多计算机系统的构建块处理器。我们提出了一个微架构级的多线程控制设计，它同步线程的执行，以允许缓存的指令被所有线程最大限度地重用。我们的实验表明，我们的设计不仅提高了管道性能，而且降低了存储器访问频率，从而有效地实现了高能效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Effective hardware-level thread synchronization for high performance and power efficiency in application specific multi-threaded embedded processors

Multi-threaded processors interleave the execution of several threads to reduce processor stalling time. Instruction cache misses usually account for a significant fraction of the overall stalling time due to frequent instruction fetches. Apart from incurring extended execution time (hence its direct impact on energy consumption), cache misses also lead to indirect power overheads and increased thread switching due to resulting main memory accesses. Therefore, minimizing instruction cache misses is important especially in designing application specific embedded processors that tend to be compact in size and consume low power. This paper aims to reduce instruction cache misses in a single pipeline processor for applications that offer embarrassing parallelism and enable the same code to be executed by a number of independent threads on different data sets. Such a design can be used as a building block processor for large multicomputer systems. We propose a micro-architectural level multithreading control design, which synchronizes the thread execution to allow cached instructions to be maximally reused by all threads. Our experiments show that our design not only increases the pipeline performance but also reduces the memory access frequency, hence effectively achieving high energy efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 33rd IEEE International Conference on Computer Design (ICCD)

自引率

0.00%

发文量