{"title":"优化Pentium 4的DSP和媒体基准:硬件和软件问题","authors":"D. Eliemble","doi":"10.1109/ICME.2002.1035524","DOIUrl":null,"url":null,"abstract":"By examining the speed-up resulting from using Pentium 4 SIMD instructions for DSP kernels (FFT) and two different multimedia programs (the MPEG-2 codec and a matching pursuit video codec), we discuss the hardware and software issues that limit performance. The cost of unaligned memory accesses and the lack of instructions summing the different parts of an XMM register in the present implementation of Intel SIMD instructions limit the efficiency of dot products. C programmer's habits often prevent compiler vectorization or complicate in-lining of assembly code in many DSP and multimedia applications.","PeriodicalId":90694,"journal":{"name":"Proceedings. IEEE International Conference on Multimedia and Expo","volume":"165 1","pages":"109-112 vol.2"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Optimizing DSP and media benchmarks for Pentium 4: hardware and software issues\",\"authors\":\"D. Eliemble\",\"doi\":\"10.1109/ICME.2002.1035524\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"By examining the speed-up resulting from using Pentium 4 SIMD instructions for DSP kernels (FFT) and two different multimedia programs (the MPEG-2 codec and a matching pursuit video codec), we discuss the hardware and software issues that limit performance. The cost of unaligned memory accesses and the lack of instructions summing the different parts of an XMM register in the present implementation of Intel SIMD instructions limit the efficiency of dot products. C programmer's habits often prevent compiler vectorization or complicate in-lining of assembly code in many DSP and multimedia applications.\",\"PeriodicalId\":90694,\"journal\":{\"name\":\"Proceedings. IEEE International Conference on Multimedia and Expo\",\"volume\":\"165 1\",\"pages\":\"109-112 vol.2\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE International Conference on Multimedia and Expo\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICME.2002.1035524\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Multimedia and Expo","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2002.1035524","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimizing DSP and media benchmarks for Pentium 4: hardware and software issues
By examining the speed-up resulting from using Pentium 4 SIMD instructions for DSP kernels (FFT) and two different multimedia programs (the MPEG-2 codec and a matching pursuit video codec), we discuss the hardware and software issues that limit performance. The cost of unaligned memory accesses and the lack of instructions summing the different parts of an XMM register in the present implementation of Intel SIMD instructions limit the efficiency of dot products. C programmer's habits often prevent compiler vectorization or complicate in-lining of assembly code in many DSP and multimedia applications.