{"title":"用OpenCL并行实现运动估计算法","authors":"A. Heikkinen, Lance Fono","doi":"10.1109/ICDSP.2013.6622694","DOIUrl":null,"url":null,"abstract":"Parallel processors such as Graphics processing units (GPUs) have emerged as co-processing units for central processing units (CPUs) to accelerate different applications. Open Computing Language (OpenCL) is a framework for multiprocessing in heterogeneous platforms. In this paper we focus on motion estimation which is the most time consuming task in video coding. We study two motion estimation algorithms in terms of parallel execution. We implemented the full search algorithm and the hierarchical search algorithm with OpenCL and with C code. Our measurements show that the OpenCL-based implementations of the algorithms on the GPU can achieve nearly 10 times speedup compared to the corresponding C implementation on a single CPU.","PeriodicalId":180360,"journal":{"name":"2013 18th International Conference on Digital Signal Processing (DSP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Parallel implementations of motion estimation algorithms using OpenCL\",\"authors\":\"A. Heikkinen, Lance Fono\",\"doi\":\"10.1109/ICDSP.2013.6622694\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parallel processors such as Graphics processing units (GPUs) have emerged as co-processing units for central processing units (CPUs) to accelerate different applications. Open Computing Language (OpenCL) is a framework for multiprocessing in heterogeneous platforms. In this paper we focus on motion estimation which is the most time consuming task in video coding. We study two motion estimation algorithms in terms of parallel execution. We implemented the full search algorithm and the hierarchical search algorithm with OpenCL and with C code. Our measurements show that the OpenCL-based implementations of the algorithms on the GPU can achieve nearly 10 times speedup compared to the corresponding C implementation on a single CPU.\",\"PeriodicalId\":180360,\"journal\":{\"name\":\"2013 18th International Conference on Digital Signal Processing (DSP)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 18th International Conference on Digital Signal Processing (DSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDSP.2013.6622694\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 18th International Conference on Digital Signal Processing (DSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDSP.2013.6622694","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Parallel implementations of motion estimation algorithms using OpenCL
Parallel processors such as Graphics processing units (GPUs) have emerged as co-processing units for central processing units (CPUs) to accelerate different applications. Open Computing Language (OpenCL) is a framework for multiprocessing in heterogeneous platforms. In this paper we focus on motion estimation which is the most time consuming task in video coding. We study two motion estimation algorithms in terms of parallel execution. We implemented the full search algorithm and the hierarchical search algorithm with OpenCL and with C code. Our measurements show that the OpenCL-based implementations of the algorithms on the GPU can achieve nearly 10 times speedup compared to the corresponding C implementation on a single CPU.