运行时计划的MPEG-4视频解码硬件加速

2007 International Symposium on System-on-Chip Pub Date : 2007-11-01 DOI:10.1109/ISSOC.2007.4427425

J. Boutellier, P. Jääskeläinen, O. Silvén

{"title":"运行时计划的MPEG-4视频解码硬件加速","authors":"J. Boutellier, P. Jääskeläinen, O. Silvén","doi":"10.1109/ISSOC.2007.4427425","DOIUrl":null,"url":null,"abstract":"In this paper we present a hardware-accelerated system-on-chip implementation of an MPEG-4 simple profile video decoder with a novel hardware accelerator interfacing methodology. The system consists of a general purpose master processor and several slave hardware accelerators. The communication between the master processor and the hardware accelerators is performed without interrupts by using piecewise-static run-time scheduling. After the data content of each macroblock has been discovered, the master processor computes a short static schedule for the accelerators. This removes the need for the accelerators to interrupt the master processor when the assigned task is finished. Therefore, context save overheads in the master processor are avoided and energy efficiency improves. The accelerators execute functions that perform block-level decoding operations (IDC, inverse quantization etc.), which have deterministic execution times and can be scheduled statically. The task scheduling algorithm executed by the master processor is able to take into account the costs and restrictions of a shared memory with limited access capabilities and marks memory accesses separately to the schedule. The possible heterogeneity of the processing units is also taken care of. Tests show that the proposed scheme is feasible and can be used as an alternative to traditional synchronization methods.","PeriodicalId":244119,"journal":{"name":"2007 International Symposium on System-on-Chip","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Run-Time Scheduled Hardware Acceleration of MPEG-4 Video Decoding\",\"authors\":\"J. Boutellier, P. Jääskeläinen, O. Silvén\",\"doi\":\"10.1109/ISSOC.2007.4427425\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present a hardware-accelerated system-on-chip implementation of an MPEG-4 simple profile video decoder with a novel hardware accelerator interfacing methodology. The system consists of a general purpose master processor and several slave hardware accelerators. The communication between the master processor and the hardware accelerators is performed without interrupts by using piecewise-static run-time scheduling. After the data content of each macroblock has been discovered, the master processor computes a short static schedule for the accelerators. This removes the need for the accelerators to interrupt the master processor when the assigned task is finished. Therefore, context save overheads in the master processor are avoided and energy efficiency improves. The accelerators execute functions that perform block-level decoding operations (IDC, inverse quantization etc.), which have deterministic execution times and can be scheduled statically. The task scheduling algorithm executed by the master processor is able to take into account the costs and restrictions of a shared memory with limited access capabilities and marks memory accesses separately to the schedule. The possible heterogeneity of the processing units is also taken care of. Tests show that the proposed scheme is feasible and can be used as an alternative to traditional synchronization methods.\",\"PeriodicalId\":244119,\"journal\":{\"name\":\"2007 International Symposium on System-on-Chip\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 International Symposium on System-on-Chip\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSOC.2007.4427425\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 International Symposium on System-on-Chip","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSOC.2007.4427425","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

在本文中，我们提出了一种硬件加速的MPEG-4简单视频解码器的片上系统实现，并采用了一种新颖的硬件加速器接口方法。该系统由一个通用主处理器和几个从用硬件加速器组成。主处理器和硬件加速器之间的通信通过分段静态运行时调度实现无中断。在发现每个宏块的数据内容之后，主处理器为加速器计算一个简短的静态调度。这样就不需要加速器在分配的任务完成时中断主处理器。因此，避免了主处理器中的上下文保存开销，并提高了能源效率。加速器执行执行块级解码操作(IDC，逆量化等)的函数，这些函数具有确定的执行时间，并且可以静态调度。由主处理器执行的任务调度算法能够考虑具有有限访问能力的共享内存的成本和限制，并将内存访问单独标记为调度。处理单元可能存在的异构性也被考虑在内。实验表明，该方案是可行的，可以作为传统同步方法的替代方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Run-Time Scheduled Hardware Acceleration of MPEG-4 Video Decoding

In this paper we present a hardware-accelerated system-on-chip implementation of an MPEG-4 simple profile video decoder with a novel hardware accelerator interfacing methodology. The system consists of a general purpose master processor and several slave hardware accelerators. The communication between the master processor and the hardware accelerators is performed without interrupts by using piecewise-static run-time scheduling. After the data content of each macroblock has been discovered, the master processor computes a short static schedule for the accelerators. This removes the need for the accelerators to interrupt the master processor when the assigned task is finished. Therefore, context save overheads in the master processor are avoided and energy efficiency improves. The accelerators execute functions that perform block-level decoding operations (IDC, inverse quantization etc.), which have deterministic execution times and can be scheduled statically. The task scheduling algorithm executed by the master processor is able to take into account the costs and restrictions of a shared memory with limited access capabilities and marks memory accesses separately to the schedule. The possible heterogeneity of the processing units is also taken care of. Tests show that the proposed scheme is feasible and can be used as an alternative to traditional synchronization methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2007 International Symposium on System-on-Chip

自引率

0.00%

发文量