An embedded co-processor architecture for energy-efficient stream computing

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) Pub Date : 2014-11-24 DOI:10.1109/ESTIMedia.2014.6962346

Amrit Panda, K. Chatha

{"title":"An embedded co-processor architecture for energy-efficient stream computing","authors":"Amrit Panda, K. Chatha","doi":"10.1109/ESTIMedia.2014.6962346","DOIUrl":null,"url":null,"abstract":"Stream processing has emerged as an important model of computation in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and exhibit large amounts of data and instruction level parallelism. Streaming applications are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements. We present StreamEngine, an embedded architecture for energy-efficient computation of stream kernels. StreamEngine introduces an instruction locking mechanism that exploits the iterative nature of the kernels and enables fine-grain instruction reuse. We also adopt a Context-aware Dataflow Execution model to exploit instruction-level and data-level parallelism within the stream kernels. Each instruction in StreamEngine is locked to a Reservation Station and maintains a context that is updated upon execution; thus instructions never retire from the RS. The entire kernel is hosted in RS Banks close to functional units for energy-efficient instruction and operand delivery. We evaluate the performance and energy-efficiency of our architecture for stream kernel benchmarks by implementing the architecture with TSMC 45nm process, and comparison with an embedded RISC processor.","PeriodicalId":265392,"journal":{"name":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESTIMedia.2014.6962346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Stream processing has emerged as an important model of computation in the context of multimedia and communication sub-systems of embedded System-on-Chip (SoC) architectures. The dataflow nature of streaming applications allows them to be most naturally expressed as a set of kernels iteratively operating on continuous streams of data. The kernels are computationally intensive and exhibit large amounts of data and instruction level parallelism. Streaming applications are mainly characterized by real-time constraints that demand high throughput and data bandwidth with limited global data reuse. Conventional architectures fail to meet these demands due to their poorly matched execution models and the overheads associated with instruction and data movements. We present StreamEngine, an embedded architecture for energy-efficient computation of stream kernels. StreamEngine introduces an instruction locking mechanism that exploits the iterative nature of the kernels and enables fine-grain instruction reuse. We also adopt a Context-aware Dataflow Execution model to exploit instruction-level and data-level parallelism within the stream kernels. Each instruction in StreamEngine is locked to a Reservation Station and maintains a context that is updated upon execution; thus instructions never retire from the RS. The entire kernel is hosted in RS Banks close to functional units for energy-efficient instruction and operand delivery. We evaluate the performance and energy-efficiency of our architecture for stream kernel benchmarks by implementing the architecture with TSMC 45nm process, and comparison with an embedded RISC processor.

查看原文本刊更多论文

一种用于高效流计算的嵌入式协处理器架构

流处理已经成为嵌入式片上系统(SoC)架构中多媒体和通信子系统的重要计算模型。流应用程序的数据流特性允许它们最自然地表示为一组迭代操作连续数据流的内核。这些内核是计算密集型的，并表现出大量的数据和指令级并行性。流应用的主要特点是实时约束，要求高吞吐量和数据带宽，限制全局数据重用。由于执行模型不匹配以及指令和数据移动相关的开销，传统的体系结构无法满足这些需求。我们提出了StreamEngine，一个用于高效流核计算的嵌入式架构。StreamEngine引入了一种指令锁定机制，该机制利用了内核的迭代特性，并支持细粒度指令重用。我们还采用上下文感知的数据流执行模型来利用流内核中的指令级和数据级并行性。StreamEngine中的每个指令都被锁定到一个Reservation Station，并维护一个上下文，该上下文在执行时更新;因此，指令永远不会从RS中退出。整个内核托管在靠近功能单元的RS库中，以实现节能指令和操作数的传递。我们通过在台积电45nm工艺下实现我们的架构，并与嵌入式RISC处理器进行比较，来评估我们的架构在流内核基准测试中的性能和能效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)

自引率

0.00%

发文量