Refining instruction set architecture for high-performance multimedia processing in constrained environments

R. Lee, A. M. Fiskiran, Z. Shi, Xiao Yang
{"title":"Refining instruction set architecture for high-performance multimedia processing in constrained environments","authors":"R. Lee, A. M. Fiskiran, Z. Shi, Xiao Yang","doi":"10.1109/ASAP.2002.1030724","DOIUrl":null,"url":null,"abstract":"Multimedia processing in software has been significantly accelerated by the addition of subword-parallel instructions to the instruction set architectures (ISAs) of modem microprocessors. While some of these multimedia instructions are simple and effective, others are very complex, requiring large, special-purpose functional units that are not practical for constrained environments such as handheld multimedia information appliances. For such environments, low-power and low-cost are as important as the high performance required for real-time multimedia processing and the general-purpose programmability required to support an ever growing range of applications. In this paper, we introduce PLX, a concise ISA that selects the most useful features from the first two generations of multimedia instructions added to microprocessors, and explores new ISA features for high-performance yet low-cost multimedia processing with small footprint processors. PLX is unique in that it is designed from scratch as a fully subword-parallel architecture with novel features like datapath scalability from 32-bit to 128-bit words, and a new definition of predication for reducing conditional branches. We illustrate the use of PLX's architectural features with four frequently used multimedia kernels: discrete cosine transform, pixel padding, clip test and median filter. Our performance results show that a 64-bit PLX implementation achieves significant speedups compared to a basic 64-bit RISC processor and to IA-32 processors with MMX and SSE multimedia extensions. PLX's datapath scalability feature often provides an additional 2x speedup in a cost-effective way.","PeriodicalId":424082,"journal":{"name":"Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP.2002.1030724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

Multimedia processing in software has been significantly accelerated by the addition of subword-parallel instructions to the instruction set architectures (ISAs) of modem microprocessors. While some of these multimedia instructions are simple and effective, others are very complex, requiring large, special-purpose functional units that are not practical for constrained environments such as handheld multimedia information appliances. For such environments, low-power and low-cost are as important as the high performance required for real-time multimedia processing and the general-purpose programmability required to support an ever growing range of applications. In this paper, we introduce PLX, a concise ISA that selects the most useful features from the first two generations of multimedia instructions added to microprocessors, and explores new ISA features for high-performance yet low-cost multimedia processing with small footprint processors. PLX is unique in that it is designed from scratch as a fully subword-parallel architecture with novel features like datapath scalability from 32-bit to 128-bit words, and a new definition of predication for reducing conditional branches. We illustrate the use of PLX's architectural features with four frequently used multimedia kernels: discrete cosine transform, pixel padding, clip test and median filter. Our performance results show that a 64-bit PLX implementation achieves significant speedups compared to a basic 64-bit RISC processor and to IA-32 processors with MMX and SSE multimedia extensions. PLX's datapath scalability feature often provides an additional 2x speedup in a cost-effective way.
改进约束环境下高性能多媒体处理的指令集体系结构
在现代微处理器的指令集结构(isa)中加入子字并行指令,大大加快了软件中的多媒体处理速度。虽然其中一些多媒体指令简单而有效,但其他的则非常复杂,需要大型的专用功能单元,这对于手持多媒体信息设备等受限环境是不实用的。对于这种环境,低功耗和低成本与实时多媒体处理所需的高性能和支持不断增长的应用程序所需的通用可编程性同样重要。在本文中,我们介绍了PLX,这是一种简明的ISA,它从添加到微处理器的前两代多媒体指令中选择最有用的功能,并探索了使用小占用处理器进行高性能但低成本多媒体处理的新ISA功能。PLX的独特之处在于,它是从头开始设计的完全子字并行架构,具有新颖的功能,如从32位到128位字的数据路径可伸缩性,以及用于减少条件分支的预测的新定义。我们用四种常用的多媒体内核来说明PLX的架构特征的使用:离散余弦变换、像素填充、剪辑测试和中值滤波器。我们的性能结果表明,与基本的64位RISC处理器和具有MMX和SSE多媒体扩展的IA-32处理器相比,64位PLX实现实现了显着的速度提升。PLX的数据路径可扩展性特性通常以经济有效的方式提供额外的2倍加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.00
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信