An efficient deep template matching and in-plane pose estimation method via template-aware dynamic convolution

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-09-23 DOI:10.1016/j.eswa.2025.129813

Ke Jia , Ji Zhou , Hanxin Li , Zhigan Zhou , Haojie Chu , Xiaojie Li

{"title":"An efficient deep template matching and in-plane pose estimation method via template-aware dynamic convolution","authors":"Ke Jia , Ji Zhou , Hanxin Li , Zhigan Zhou , Haojie Chu , Xiaojie Li","doi":"10.1016/j.eswa.2025.129813","DOIUrl":null,"url":null,"abstract":"<div><div>In industrial inspection and component alignment tasks, template matching requires efficient estimation of a target’s position and geometric state (rotation and scaling) under complex backgrounds to support precise downstream operations. Traditional methods rely on exhaustive enumeration of angles and scales, leading to low efficiency under compound transformations. Meanwhile, most deep learning-based approaches only estimate similarity scores without explicitly modeling geometric pose, making them inadequate for real-world deployment. To overcome these limitations, we propose a lightweight end-to-end framework that reformulates template matching as joint localization and geometric regression, outputting the center coordinates, rotation angle, and independent horizontal and vertical scales. A Template-Aware Dynamic Convolution Module (TDCM) dynamically injects template features at inference to guide generalizable matching. The compact network integrates depthwise separable convolutions and pixel shuffle for efficient matching. To enable geometric-annotation-free training, we introduce a rotation-shear-based augmentation strategy with structure-aware pseudo labels. A lightweight refinement module further improves angle and scale precision via local optimization. Experiments show our 3.07M model achieves high precision and <span><math><mo>∼</mo></math></span>14 ms inference under compound transformations. It also demonstrates strong robustness in small-template and multi-object scenarios, making it highly suitable for deployment in real-time industrial applications. The code is available at: <span><span>https://github.com/ZhouJ6610/PoseMatch-TDCM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"298 ","pages":"Article 129813"},"PeriodicalIF":7.5000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425034281","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In industrial inspection and component alignment tasks, template matching requires efficient estimation of a target’s position and geometric state (rotation and scaling) under complex backgrounds to support precise downstream operations. Traditional methods rely on exhaustive enumeration of angles and scales, leading to low efficiency under compound transformations. Meanwhile, most deep learning-based approaches only estimate similarity scores without explicitly modeling geometric pose, making them inadequate for real-world deployment. To overcome these limitations, we propose a lightweight end-to-end framework that reformulates template matching as joint localization and geometric regression, outputting the center coordinates, rotation angle, and independent horizontal and vertical scales. A Template-Aware Dynamic Convolution Module (TDCM) dynamically injects template features at inference to guide generalizable matching. The compact network integrates depthwise separable convolutions and pixel shuffle for efficient matching. To enable geometric-annotation-free training, we introduce a rotation-shear-based augmentation strategy with structure-aware pseudo labels. A lightweight refinement module further improves angle and scale precision via local optimization. Experiments show our 3.07M model achieves high precision and

\sim

14 ms inference under compound transformations. It also demonstrates strong robustness in small-template and multi-object scenarios, making it highly suitable for deployment in real-time industrial applications. The code is available at: https://github.com/ZhouJ6610/PoseMatch-TDCM.

Abstract Image

查看原文本刊更多论文

一种基于模板感知动态卷积的深度模板匹配和平面位姿估计方法

在工业检测和组件对准任务中，模板匹配需要在复杂背景下有效估计目标的位置和几何状态（旋转和缩放），以支持精确的下游操作。传统的方法依赖于角度和尺度的穷举枚举，导致复合变换的效率很低。与此同时，大多数基于深度学习的方法只估计相似性分数，而没有明确地建模几何姿态，这使得它们不适合现实世界的部署。为了克服这些限制，我们提出了一个轻量级的端到端框架，该框架将模板匹配重新定义为关节定位和几何回归，输出中心坐标，旋转角度以及独立的水平和垂直尺度。基于模板感知的动态卷积模块（TDCM）在推理中动态注入模板特征，指导可泛化匹配。紧凑的网络集成了深度可分离卷积和像素洗牌，以实现高效匹配。为了实现无几何标注的训练，我们引入了一种基于旋转剪切的增强策略，该策略具有结构感知伪标签。一个轻量级的细化模块通过局部优化进一步提高角度和尺度精度。实验表明，我们的3.07M模型在复合变换下实现了高精度和~ 14 ms的推理。它还在小模板和多对象场景中展示了强大的鲁棒性，使其非常适合在实时工业应用中部署。代码可从https://github.com/ZhouJ6610/PoseMatch-TDCM获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.