TeTRA-VPR: A Ternary Transformer Approach for Compact Visual Place Recognition

IF 5.3 2区 计算机科学 Q2 ROBOTICS
Oliver Grainge;Michael J. Milford;Indu Bodala;Sarvapali D. Ramchurn;Shoaib Ehsan
{"title":"TeTRA-VPR: A Ternary Transformer Approach for Compact Visual Place Recognition","authors":"Oliver Grainge;Michael J. Milford;Indu Bodala;Sarvapali D. Ramchurn;Shoaib Ehsan","doi":"10.1109/LRA.2025.3585715","DOIUrl":null,"url":null,"abstract":"Visual Place Recognition (VPR) localizes a query image by matching it against a database of geo-tagged reference images, making it essential for navigation and mapping in robotics. Although Vision Transformer (ViT) solutions deliver high accuracy, their large models often exceed the memory and compute budgets of resource-constrained platforms such as drones and mobile robots. To address this issue, we propose <italic>TeTRA</i>, a ternary transformer approach that progressively quantizes the ViT backbone to 2-bit precision and binarizes its final embedding layer, offering substantial reductions in model size and latency. A carefully designed progressive distillation strategy preserves the representational power of a full-precision teacher, allowing <italic>TeTRA</i> to retain or even surpass the accuracy of uncompressed convolutional counterparts, despite using fewer resources. Experiments on standard VPR benchmarks demonstrate that TeTRA reduces memory consumption by up to 69% compared to efficient baselines, while lowering inference latency by 35%, with either no loss or a slight improvement in recall@1. These gains enable high-accuracy VPR on power-constrained, memory-limited robotic platforms, making <italic>TeTRA</i> an appealing solution for real-world deployment.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 8","pages":"8396-8403"},"PeriodicalIF":5.3000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11067943/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Visual Place Recognition (VPR) localizes a query image by matching it against a database of geo-tagged reference images, making it essential for navigation and mapping in robotics. Although Vision Transformer (ViT) solutions deliver high accuracy, their large models often exceed the memory and compute budgets of resource-constrained platforms such as drones and mobile robots. To address this issue, we propose TeTRA, a ternary transformer approach that progressively quantizes the ViT backbone to 2-bit precision and binarizes its final embedding layer, offering substantial reductions in model size and latency. A carefully designed progressive distillation strategy preserves the representational power of a full-precision teacher, allowing TeTRA to retain or even surpass the accuracy of uncompressed convolutional counterparts, despite using fewer resources. Experiments on standard VPR benchmarks demonstrate that TeTRA reduces memory consumption by up to 69% compared to efficient baselines, while lowering inference latency by 35%, with either no loss or a slight improvement in recall@1. These gains enable high-accuracy VPR on power-constrained, memory-limited robotic platforms, making TeTRA an appealing solution for real-world deployment.
一种用于紧凑视觉位置识别的三元变换方法
视觉位置识别(VPR)通过将查询图像与地理标记的参考图像数据库进行匹配来定位查询图像,使其在机器人导航和绘图中至关重要。尽管Vision Transformer (ViT)解决方案提供了高精度,但它们的大型模型通常超出了无人机和移动机器人等资源受限平台的内存和计算预算。为了解决这个问题,我们提出了TeTRA,这是一种三元变压器方法,它逐步将ViT主干量化到2位精度,并对其最终嵌入层进行二值化,从而大大减少了模型尺寸和延迟。精心设计的渐进蒸馏策略保留了全精度教师的代表性力量,允许TeTRA保留甚至超过未压缩卷积对应的准确性,尽管使用更少的资源。在标准VPR基准测试上的实验表明,与高效基准相比,TeTRA将内存消耗降低了69%,同时将推理延迟降低了35%,recall@1没有损失或略有改善。这些优点使得在功率受限、内存有限的机器人平台上实现高精度VPR,使TeTRA成为现实世界部署的一个有吸引力的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Robotics and Automation Letters
IEEE Robotics and Automation Letters Computer Science-Computer Science Applications
CiteScore
9.60
自引率
15.40%
发文量
1428
期刊介绍: The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信