{"title":"Refining instruction set architecture for high-performance multimedia processing in constrained environments","authors":"R. Lee, A. M. Fiskiran, Z. Shi, Xiao Yang","doi":"10.1109/ASAP.2002.1030724","DOIUrl":null,"url":null,"abstract":"Multimedia processing in software has been significantly accelerated by the addition of subword-parallel instructions to the instruction set architectures (ISAs) of modem microprocessors. While some of these multimedia instructions are simple and effective, others are very complex, requiring large, special-purpose functional units that are not practical for constrained environments such as handheld multimedia information appliances. For such environments, low-power and low-cost are as important as the high performance required for real-time multimedia processing and the general-purpose programmability required to support an ever growing range of applications. In this paper, we introduce PLX, a concise ISA that selects the most useful features from the first two generations of multimedia instructions added to microprocessors, and explores new ISA features for high-performance yet low-cost multimedia processing with small footprint processors. PLX is unique in that it is designed from scratch as a fully subword-parallel architecture with novel features like datapath scalability from 32-bit to 128-bit words, and a new definition of predication for reducing conditional branches. We illustrate the use of PLX's architectural features with four frequently used multimedia kernels: discrete cosine transform, pixel padding, clip test and median filter. Our performance results show that a 64-bit PLX implementation achieves significant speedups compared to a basic 64-bit RISC processor and to IA-32 processors with MMX and SSE multimedia extensions. PLX's datapath scalability feature often provides an additional 2x speedup in a cost-effective way.","PeriodicalId":424082,"journal":{"name":"Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP.2002.1030724","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17
Abstract
Multimedia processing in software has been significantly accelerated by the addition of subword-parallel instructions to the instruction set architectures (ISAs) of modem microprocessors. While some of these multimedia instructions are simple and effective, others are very complex, requiring large, special-purpose functional units that are not practical for constrained environments such as handheld multimedia information appliances. For such environments, low-power and low-cost are as important as the high performance required for real-time multimedia processing and the general-purpose programmability required to support an ever growing range of applications. In this paper, we introduce PLX, a concise ISA that selects the most useful features from the first two generations of multimedia instructions added to microprocessors, and explores new ISA features for high-performance yet low-cost multimedia processing with small footprint processors. PLX is unique in that it is designed from scratch as a fully subword-parallel architecture with novel features like datapath scalability from 32-bit to 128-bit words, and a new definition of predication for reducing conditional branches. We illustrate the use of PLX's architectural features with four frequently used multimedia kernels: discrete cosine transform, pixel padding, clip test and median filter. Our performance results show that a 64-bit PLX implementation achieves significant speedups compared to a basic 64-bit RISC processor and to IA-32 processors with MMX and SSE multimedia extensions. PLX's datapath scalability feature often provides an additional 2x speedup in a cost-effective way.