TeTRA-VPR: A Ternary Transformer Approach for Compact Visual Place Recognition

IF 5.3 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-07-03 DOI:10.1109/LRA.2025.3585715

Oliver Grainge;Michael J. Milford;Indu Bodala;Sarvapali D. Ramchurn;Shoaib Ehsan

{"title":"TeTRA-VPR: A Ternary Transformer Approach for Compact Visual Place Recognition","authors":"Oliver Grainge;Michael J. Milford;Indu Bodala;Sarvapali D. Ramchurn;Shoaib Ehsan","doi":"10.1109/LRA.2025.3585715","DOIUrl":null,"url":null,"abstract":"Visual Place Recognition (VPR) localizes a query image by matching it against a database of geo-tagged reference images, making it essential for navigation and mapping in robotics. Although Vision Transformer (ViT) solutions deliver high accuracy, their large models often exceed the memory and compute budgets of resource-constrained platforms such as drones and mobile robots. To address this issue, we propose <italic>TeTRA</i>, a ternary transformer approach that progressively quantizes the ViT backbone to 2-bit precision and binarizes its final embedding layer, offering substantial reductions in model size and latency. A carefully designed progressive distillation strategy preserves the representational power of a full-precision teacher, allowing <italic>TeTRA</i> to retain or even surpass the accuracy of uncompressed convolutional counterparts, despite using fewer resources. Experiments on standard VPR benchmarks demonstrate that TeTRA reduces memory consumption by up to 69% compared to efficient baselines, while lowering inference latency by 35%, with either no loss or a slight improvement in recall@1. These gains enable high-accuracy VPR on power-constrained, memory-limited robotic platforms, making <italic>TeTRA</i> an appealing solution for real-world deployment.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 8","pages":"8396-8403"},"PeriodicalIF":5.3000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11067943/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Visual Place Recognition (VPR) localizes a query image by matching it against a database of geo-tagged reference images, making it essential for navigation and mapping in robotics. Although Vision Transformer (ViT) solutions deliver high accuracy, their large models often exceed the memory and compute budgets of resource-constrained platforms such as drones and mobile robots. To address this issue, we propose TeTRA, a ternary transformer approach that progressively quantizes the ViT backbone to 2-bit precision and binarizes its final embedding layer, offering substantial reductions in model size and latency. A carefully designed progressive distillation strategy preserves the representational power of a full-precision teacher, allowing TeTRA to retain or even surpass the accuracy of uncompressed convolutional counterparts, despite using fewer resources. Experiments on standard VPR benchmarks demonstrate that TeTRA reduces memory consumption by up to 69% compared to efficient baselines, while lowering inference latency by 35%, with either no loss or a slight improvement in recall@1. These gains enable high-accuracy VPR on power-constrained, memory-limited robotic platforms, making TeTRA an appealing solution for real-world deployment.

查看原文本刊更多论文

一种用于紧凑视觉位置识别的三元变换方法

视觉位置识别（VPR）通过将查询图像与地理标记的参考图像数据库进行匹配来定位查询图像，使其在机器人导航和绘图中至关重要。尽管Vision Transformer （ViT）解决方案提供了高精度，但它们的大型模型通常超出了无人机和移动机器人等资源受限平台的内存和计算预算。为了解决这个问题，我们提出了TeTRA，这是一种三元变压器方法，它逐步将ViT主干量化到2位精度，并对其最终嵌入层进行二值化，从而大大减少了模型尺寸和延迟。精心设计的渐进蒸馏策略保留了全精度教师的代表性力量，允许TeTRA保留甚至超过未压缩卷积对应的准确性，尽管使用更少的资源。在标准VPR基准测试上的实验表明，与高效基准相比，TeTRA将内存消耗降低了69%，同时将推理延迟降低了35%，recall@1没有损失或略有改善。这些优点使得在功率受限、内存有限的机器人平台上实现高精度VPR，使TeTRA成为现实世界部署的一个有吸引力的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.