在特定于应用程序的多线程嵌入式处理器中实现高效的硬件级线程同步

M. Wickramasinghe, Hui Guo
{"title":"在特定于应用程序的多线程嵌入式处理器中实现高效的硬件级线程同步","authors":"M. Wickramasinghe, Hui Guo","doi":"10.1109/ICCD.2015.7357119","DOIUrl":null,"url":null,"abstract":"Multi-threaded processors interleave the execution of several threads to reduce processor stalling time. Instruction cache misses usually account for a significant fraction of the overall stalling time due to frequent instruction fetches. Apart from incurring extended execution time (hence its direct impact on energy consumption), cache misses also lead to indirect power overheads and increased thread switching due to resulting main memory accesses. Therefore, minimizing instruction cache misses is important especially in designing application specific embedded processors that tend to be compact in size and consume low power. This paper aims to reduce instruction cache misses in a single pipeline processor for applications that offer embarrassing parallelism and enable the same code to be executed by a number of independent threads on different data sets. Such a design can be used as a building block processor for large multicomputer systems. We propose a micro-architectural level multithreading control design, which synchronizes the thread execution to allow cached instructions to be maximally reused by all threads. Our experiments show that our design not only increases the pipeline performance but also reduces the memory access frequency, hence effectively achieving high energy efficiency.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Effective hardware-level thread synchronization for high performance and power efficiency in application specific multi-threaded embedded processors\",\"authors\":\"M. Wickramasinghe, Hui Guo\",\"doi\":\"10.1109/ICCD.2015.7357119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-threaded processors interleave the execution of several threads to reduce processor stalling time. Instruction cache misses usually account for a significant fraction of the overall stalling time due to frequent instruction fetches. Apart from incurring extended execution time (hence its direct impact on energy consumption), cache misses also lead to indirect power overheads and increased thread switching due to resulting main memory accesses. Therefore, minimizing instruction cache misses is important especially in designing application specific embedded processors that tend to be compact in size and consume low power. This paper aims to reduce instruction cache misses in a single pipeline processor for applications that offer embarrassing parallelism and enable the same code to be executed by a number of independent threads on different data sets. Such a design can be used as a building block processor for large multicomputer systems. We propose a micro-architectural level multithreading control design, which synchronizes the thread execution to allow cached instructions to be maximally reused by all threads. Our experiments show that our design not only increases the pipeline performance but also reduces the memory access frequency, hence effectively achieving high energy efficiency.\",\"PeriodicalId\":129506,\"journal\":{\"name\":\"2015 33rd IEEE International Conference on Computer Design (ICCD)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 33rd IEEE International Conference on Computer Design (ICCD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2015.7357119\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 33rd IEEE International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2015.7357119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

多线程处理器交错执行多个线程以减少处理器的停机时间。指令缓存丢失通常占总延迟时间的很大一部分,这是由于频繁获取指令造成的。除了导致执行时间延长(因此对能耗有直接影响)之外,缓存丢失还会导致间接的电源开销和由于主内存访问而增加的线程切换。因此,最小化指令缓存丢失是非常重要的,特别是在设计特定于应用程序的嵌入式处理器时,这种处理器往往尺寸紧凑,功耗低。本文旨在减少单个流水线处理器中指令缓存丢失的情况,这些应用程序提供了令人尴尬的并行性,并使相同的代码可以由许多独立的线程在不同的数据集上执行。这种设计可以用作大型多计算机系统的构建块处理器。我们提出了一个微架构级的多线程控制设计,它同步线程的执行,以允许缓存的指令被所有线程最大限度地重用。我们的实验表明,我们的设计不仅提高了管道性能,而且降低了存储器访问频率,从而有效地实现了高能效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Effective hardware-level thread synchronization for high performance and power efficiency in application specific multi-threaded embedded processors
Multi-threaded processors interleave the execution of several threads to reduce processor stalling time. Instruction cache misses usually account for a significant fraction of the overall stalling time due to frequent instruction fetches. Apart from incurring extended execution time (hence its direct impact on energy consumption), cache misses also lead to indirect power overheads and increased thread switching due to resulting main memory accesses. Therefore, minimizing instruction cache misses is important especially in designing application specific embedded processors that tend to be compact in size and consume low power. This paper aims to reduce instruction cache misses in a single pipeline processor for applications that offer embarrassing parallelism and enable the same code to be executed by a number of independent threads on different data sets. Such a design can be used as a building block processor for large multicomputer systems. We propose a micro-architectural level multithreading control design, which synchronizes the thread execution to allow cached instructions to be maximally reused by all threads. Our experiments show that our design not only increases the pipeline performance but also reduces the memory access frequency, hence effectively achieving high energy efficiency.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信