Bart Pieters, Charles-Frederik Hollemeersch, J. D. Cock, W. D. Neve, P. Lambert, R. Walle
{"title":"并行去块滤波在H.264/AVC使用多个cpu和gpu","authors":"Bart Pieters, Charles-Frederik Hollemeersch, J. D. Cock, W. D. Neve, P. Lambert, R. Walle","doi":"10.1145/2393347.2396370","DOIUrl":null,"url":null,"abstract":"Deblocking filtering in the H.264/AVC standard is a computationally complex process because of the filter's high content adaptivity. Furthermore, the deblocking filter introduces a significant number of data dependencies, making parallel processing not obvious. Our previous works analyzed the dependencies of the filter and proposed a massively-parallel implementation, specifically tailored for execution on a single GPU. In this paper, we extend this work by proposing a parallel processing scheme for accelerating deblocking filtering using multiple CPU cores or GPUs. This scheme allows for standard-compliant filtering, regardless of slice configuration. Results show that our multi-GPU implementation using our proposed scheme achieves faster-than real-time deblocking at over 3794 frames per second for 1080p video pictures by using three GPUs. A multi-core CPU implementation using 8 CPU cores allows 1080p deblocking filtering of up to 695 frames per second.","PeriodicalId":212654,"journal":{"name":"Proceedings of the 20th ACM international conference on Multimedia","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Parallel deblocking filtering in H.264/AVC using multiple CPUs and GPUs\",\"authors\":\"Bart Pieters, Charles-Frederik Hollemeersch, J. D. Cock, W. D. Neve, P. Lambert, R. Walle\",\"doi\":\"10.1145/2393347.2396370\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deblocking filtering in the H.264/AVC standard is a computationally complex process because of the filter's high content adaptivity. Furthermore, the deblocking filter introduces a significant number of data dependencies, making parallel processing not obvious. Our previous works analyzed the dependencies of the filter and proposed a massively-parallel implementation, specifically tailored for execution on a single GPU. In this paper, we extend this work by proposing a parallel processing scheme for accelerating deblocking filtering using multiple CPU cores or GPUs. This scheme allows for standard-compliant filtering, regardless of slice configuration. Results show that our multi-GPU implementation using our proposed scheme achieves faster-than real-time deblocking at over 3794 frames per second for 1080p video pictures by using three GPUs. A multi-core CPU implementation using 8 CPU cores allows 1080p deblocking filtering of up to 695 frames per second.\",\"PeriodicalId\":212654,\"journal\":{\"name\":\"Proceedings of the 20th ACM international conference on Multimedia\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th ACM international conference on Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2393347.2396370\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2393347.2396370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Parallel deblocking filtering in H.264/AVC using multiple CPUs and GPUs
Deblocking filtering in the H.264/AVC standard is a computationally complex process because of the filter's high content adaptivity. Furthermore, the deblocking filter introduces a significant number of data dependencies, making parallel processing not obvious. Our previous works analyzed the dependencies of the filter and proposed a massively-parallel implementation, specifically tailored for execution on a single GPU. In this paper, we extend this work by proposing a parallel processing scheme for accelerating deblocking filtering using multiple CPU cores or GPUs. This scheme allows for standard-compliant filtering, regardless of slice configuration. Results show that our multi-GPU implementation using our proposed scheme achieves faster-than real-time deblocking at over 3794 frames per second for 1080p video pictures by using three GPUs. A multi-core CPU implementation using 8 CPU cores allows 1080p deblocking filtering of up to 695 frames per second.