{"title":"Optimizing the Census Transform on CUDA enabled GPUs","authors":"C. Pantilie, S. Nedevschi","doi":"10.1109/ICCP.2012.6356186","DOIUrl":null,"url":null,"abstract":"The Census Transform is one of the most widely used matching metrics in problems that involve correspondence search such as stereo reconstruction and optical flow. Graphic processing units (GPUs) have become popular platforms for such computation intensive applications that expose a high degree of data parallelism. Their evolution as a platform for general purpose computing by continuously adding new hardware features has improved performance for many applications but it has also expanded the set of possible implementations choices up to the point where guidelines alone are not sufficient for optimum performance. What is the best implementation in the case of the Census Transform? This paper will answer that question by benchmarking all major possible implementations. Its aim is to provide an optimal implementation of the Census Transform on a current generation graphics processing unit using the Compute Unified Device Architecture (CUDA). The results have value reaching far beyond the Census Transform and provide insight for applications where non-separable 2D convolutions are present.","PeriodicalId":406461,"journal":{"name":"2012 IEEE 8th International Conference on Intelligent Computer Communication and Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 8th International Conference on Intelligent Computer Communication and Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCP.2012.6356186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
The Census Transform is one of the most widely used matching metrics in problems that involve correspondence search such as stereo reconstruction and optical flow. Graphic processing units (GPUs) have become popular platforms for such computation intensive applications that expose a high degree of data parallelism. Their evolution as a platform for general purpose computing by continuously adding new hardware features has improved performance for many applications but it has also expanded the set of possible implementations choices up to the point where guidelines alone are not sufficient for optimum performance. What is the best implementation in the case of the Census Transform? This paper will answer that question by benchmarking all major possible implementations. Its aim is to provide an optimal implementation of the Census Transform on a current generation graphics processing unit using the Compute Unified Device Architecture (CUDA). The results have value reaching far beyond the Census Transform and provide insight for applications where non-separable 2D convolutions are present.