有效的模板匹配与可变大小的模板在CUDA

2010 IEEE 8th Symposium on Application Specific Processors (SASP) Pub Date : 2010-06-13 DOI:10.1109/SASP.2010.5521142

Nicholas Moore, M. Leeser, L. King

{"title":"有效的模板匹配与可变大小的模板在CUDA","authors":"Nicholas Moore, M. Leeser, L. King","doi":"10.1109/SASP.2010.5521142","DOIUrl":null,"url":null,"abstract":"Graphics processing units (GPUs) offer significantly higher peak performance than CPUs, but for a limited problem space. Even within this space, GPU solutions are often restricted to a set of specific problem instances or offer greatly varying performance for slightly different parameters. This makes providing a library of GPU implementations that is adaptable to arbitrary inputs a difficult task. This research is motivated by a MATLAB lung tumor tracking application that relies on two-dimensional correlation and uses large template sizes. While GPU-based template matching has been addressed in the past, template sizes were limited to specific, relatively small sizes and not acceptable for accelerating the target application. This paper discusses a CUDA implementation that supports large template sizes and is adaptable to arbitrary template dimensions. The implementation uses on-demand compilation of kernels and compile-time expansion of various kernel parameters to improve the implementation adaptability without sacrificing performance.","PeriodicalId":119893,"journal":{"name":"2010 IEEE 8th Symposium on Application Specific Processors (SASP)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Efficient template matching with variable size templates in CUDA\",\"authors\":\"Nicholas Moore, M. Leeser, L. King\",\"doi\":\"10.1109/SASP.2010.5521142\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphics processing units (GPUs) offer significantly higher peak performance than CPUs, but for a limited problem space. Even within this space, GPU solutions are often restricted to a set of specific problem instances or offer greatly varying performance for slightly different parameters. This makes providing a library of GPU implementations that is adaptable to arbitrary inputs a difficult task. This research is motivated by a MATLAB lung tumor tracking application that relies on two-dimensional correlation and uses large template sizes. While GPU-based template matching has been addressed in the past, template sizes were limited to specific, relatively small sizes and not acceptable for accelerating the target application. This paper discusses a CUDA implementation that supports large template sizes and is adaptable to arbitrary template dimensions. The implementation uses on-demand compilation of kernels and compile-time expansion of various kernel parameters to improve the implementation adaptability without sacrificing performance.\",\"PeriodicalId\":119893,\"journal\":{\"name\":\"2010 IEEE 8th Symposium on Application Specific Processors (SASP)\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 8th Symposium on Application Specific Processors (SASP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SASP.2010.5521142\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 8th Symposium on Application Specific Processors (SASP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SASP.2010.5521142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

图形处理单元(gpu)提供比cpu高得多的峰值性能，但问题空间有限。即使在这个范围内，GPU解决方案通常也仅限于一组特定的问题实例，或者为稍微不同的参数提供巨大的性能变化。这使得提供一个可适应任意输入的GPU实现库成为一项困难的任务。本研究的动机是基于MATLAB的肺肿瘤跟踪应用程序，该应用程序依赖于二维相关，使用大模板尺寸。虽然过去已经解决了基于gpu的模板匹配问题，但模板大小仅限于特定的、相对较小的尺寸，对于加速目标应用程序来说是不可接受的。本文讨论了一种支持大模板尺寸并可适应任意模板尺寸的CUDA实现。该实现使用内核的按需编译和各种内核参数的编译时扩展来提高实现的适应性，而不牺牲性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient template matching with variable size templates in CUDA

Graphics processing units (GPUs) offer significantly higher peak performance than CPUs, but for a limited problem space. Even within this space, GPU solutions are often restricted to a set of specific problem instances or offer greatly varying performance for slightly different parameters. This makes providing a library of GPU implementations that is adaptable to arbitrary inputs a difficult task. This research is motivated by a MATLAB lung tumor tracking application that relies on two-dimensional correlation and uses large template sizes. While GPU-based template matching has been addressed in the past, template sizes were limited to specific, relatively small sizes and not acceptable for accelerating the target application. This paper discusses a CUDA implementation that supports large template sizes and is adaptable to arbitrary template dimensions. The implementation uses on-demand compilation of kernels and compile-time expansion of various kernel parameters to improve the implementation adaptability without sacrificing performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE 8th Symposium on Application Specific Processors (SASP)

自引率

0.00%

发文量