{"title":"Efficient template matching with variable size templates in CUDA","authors":"Nicholas Moore, M. Leeser, L. King","doi":"10.1109/SASP.2010.5521142","DOIUrl":null,"url":null,"abstract":"Graphics processing units (GPUs) offer significantly higher peak performance than CPUs, but for a limited problem space. Even within this space, GPU solutions are often restricted to a set of specific problem instances or offer greatly varying performance for slightly different parameters. This makes providing a library of GPU implementations that is adaptable to arbitrary inputs a difficult task. This research is motivated by a MATLAB lung tumor tracking application that relies on two-dimensional correlation and uses large template sizes. While GPU-based template matching has been addressed in the past, template sizes were limited to specific, relatively small sizes and not acceptable for accelerating the target application. This paper discusses a CUDA implementation that supports large template sizes and is adaptable to arbitrary template dimensions. The implementation uses on-demand compilation of kernels and compile-time expansion of various kernel parameters to improve the implementation adaptability without sacrificing performance.","PeriodicalId":119893,"journal":{"name":"2010 IEEE 8th Symposium on Application Specific Processors (SASP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2010-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 8th Symposium on Application Specific Processors (SASP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SASP.2010.5521142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Graphics processing units (GPUs) offer significantly higher peak performance than CPUs, but for a limited problem space. Even within this space, GPU solutions are often restricted to a set of specific problem instances or offer greatly varying performance for slightly different parameters. This makes providing a library of GPU implementations that is adaptable to arbitrary inputs a difficult task. This research is motivated by a MATLAB lung tumor tracking application that relies on two-dimensional correlation and uses large template sizes. While GPU-based template matching has been addressed in the past, template sizes were limited to specific, relatively small sizes and not acceptable for accelerating the target application. This paper discusses a CUDA implementation that supports large template sizes and is adaptable to arbitrary template dimensions. The implementation uses on-demand compilation of kernels and compile-time expansion of various kernel parameters to improve the implementation adaptability without sacrificing performance.