运行时计划的MPEG-4视频解码硬件加速

J. Boutellier, P. Jääskeläinen, O. Silvén
{"title":"运行时计划的MPEG-4视频解码硬件加速","authors":"J. Boutellier, P. Jääskeläinen, O. Silvén","doi":"10.1109/ISSOC.2007.4427425","DOIUrl":null,"url":null,"abstract":"In this paper we present a hardware-accelerated system-on-chip implementation of an MPEG-4 simple profile video decoder with a novel hardware accelerator interfacing methodology. The system consists of a general purpose master processor and several slave hardware accelerators. The communication between the master processor and the hardware accelerators is performed without interrupts by using piecewise-static run-time scheduling. After the data content of each macroblock has been discovered, the master processor computes a short static schedule for the accelerators. This removes the need for the accelerators to interrupt the master processor when the assigned task is finished. Therefore, context save overheads in the master processor are avoided and energy efficiency improves. The accelerators execute functions that perform block-level decoding operations (IDC, inverse quantization etc.), which have deterministic execution times and can be scheduled statically. The task scheduling algorithm executed by the master processor is able to take into account the costs and restrictions of a shared memory with limited access capabilities and marks memory accesses separately to the schedule. The possible heterogeneity of the processing units is also taken care of. Tests show that the proposed scheme is feasible and can be used as an alternative to traditional synchronization methods.","PeriodicalId":244119,"journal":{"name":"2007 International Symposium on System-on-Chip","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Run-Time Scheduled Hardware Acceleration of MPEG-4 Video Decoding\",\"authors\":\"J. Boutellier, P. Jääskeläinen, O. Silvén\",\"doi\":\"10.1109/ISSOC.2007.4427425\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present a hardware-accelerated system-on-chip implementation of an MPEG-4 simple profile video decoder with a novel hardware accelerator interfacing methodology. The system consists of a general purpose master processor and several slave hardware accelerators. The communication between the master processor and the hardware accelerators is performed without interrupts by using piecewise-static run-time scheduling. After the data content of each macroblock has been discovered, the master processor computes a short static schedule for the accelerators. This removes the need for the accelerators to interrupt the master processor when the assigned task is finished. Therefore, context save overheads in the master processor are avoided and energy efficiency improves. The accelerators execute functions that perform block-level decoding operations (IDC, inverse quantization etc.), which have deterministic execution times and can be scheduled statically. The task scheduling algorithm executed by the master processor is able to take into account the costs and restrictions of a shared memory with limited access capabilities and marks memory accesses separately to the schedule. The possible heterogeneity of the processing units is also taken care of. Tests show that the proposed scheme is feasible and can be used as an alternative to traditional synchronization methods.\",\"PeriodicalId\":244119,\"journal\":{\"name\":\"2007 International Symposium on System-on-Chip\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 International Symposium on System-on-Chip\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSOC.2007.4427425\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 International Symposium on System-on-Chip","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSOC.2007.4427425","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

在本文中,我们提出了一种硬件加速的MPEG-4简单视频解码器的片上系统实现,并采用了一种新颖的硬件加速器接口方法。该系统由一个通用主处理器和几个从用硬件加速器组成。主处理器和硬件加速器之间的通信通过分段静态运行时调度实现无中断。在发现每个宏块的数据内容之后,主处理器为加速器计算一个简短的静态调度。这样就不需要加速器在分配的任务完成时中断主处理器。因此,避免了主处理器中的上下文保存开销,并提高了能源效率。加速器执行执行块级解码操作(IDC,逆量化等)的函数,这些函数具有确定的执行时间,并且可以静态调度。由主处理器执行的任务调度算法能够考虑具有有限访问能力的共享内存的成本和限制,并将内存访问单独标记为调度。处理单元可能存在的异构性也被考虑在内。实验表明,该方案是可行的,可以作为传统同步方法的替代方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Run-Time Scheduled Hardware Acceleration of MPEG-4 Video Decoding
In this paper we present a hardware-accelerated system-on-chip implementation of an MPEG-4 simple profile video decoder with a novel hardware accelerator interfacing methodology. The system consists of a general purpose master processor and several slave hardware accelerators. The communication between the master processor and the hardware accelerators is performed without interrupts by using piecewise-static run-time scheduling. After the data content of each macroblock has been discovered, the master processor computes a short static schedule for the accelerators. This removes the need for the accelerators to interrupt the master processor when the assigned task is finished. Therefore, context save overheads in the master processor are avoided and energy efficiency improves. The accelerators execute functions that perform block-level decoding operations (IDC, inverse quantization etc.), which have deterministic execution times and can be scheduled statically. The task scheduling algorithm executed by the master processor is able to take into account the costs and restrictions of a shared memory with limited access capabilities and marks memory accesses separately to the schedule. The possible heterogeneity of the processing units is also taken care of. Tests show that the proposed scheme is feasible and can be used as an alternative to traditional synchronization methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信