TT-LCD: Tensorized-Transformer based Loop Closure Detection for Robotic Visual SLAM on Edge

Chenchen Ding, Hongwei Ren, Zhiru Guo, Minjie Bi, Changhai Man, Tingting Wang, Shuwei Li, Shaobo Luo, Rumin Zhang, Hao Yu
{"title":"TT-LCD: Tensorized-Transformer based Loop Closure Detection for Robotic Visual SLAM on Edge","authors":"Chenchen Ding, Hongwei Ren, Zhiru Guo, Minjie Bi, Changhai Man, Tingting Wang, Shuwei Li, Shaobo Luo, Rumin Zhang, Hao Yu","doi":"10.1109/ICARM58088.2023.10218828","DOIUrl":null,"url":null,"abstract":"Visual simultaneous localization and mapping (VSLAM) is one of the core technologies in autonomous driving, intelligent robots, metaverse and other fields. Besides, loop closure detection (LCD) is an essential component in VSLAM which can correct the drift and accumulated errors caused by the visual odometry (VO) front-end, and assist robot to build a globally consistent map. Over the years, several deep-learning methods have been proposed to address the task. However, the prior proposed neural network-based LCD models are heavy in model size, and difficult to be deployed on edge devices. In this paper, an LCD module based on the tensorized transformer model called TT-LCD is proposed. To obtain a tensorized transformer model with accuracy-complexity co-awareness which can be effectively deployed, we proposed a construction method for tensor compressed transformer model with tensor-train (TT) decomposition and a differential neural network architecture search (NAS) method for tensor rank selection. Experiments demonstrate that the TT-LCD realizes a model size 6.04 × smaller than uncompressed transformer model, 32.1 × smaller than the VGG model and achieves lower memory cost of about 134M on edge CPU with little loss of accuracy on pre-training dataset but even 2.13% higher average accuracy on NewCollege dataset compared with uncompressed DeiT-based model in LCD task.","PeriodicalId":220013,"journal":{"name":"2023 International Conference on Advanced Robotics and Mechatronics (ICARM)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Advanced Robotics and Mechatronics (ICARM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICARM58088.2023.10218828","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Visual simultaneous localization and mapping (VSLAM) is one of the core technologies in autonomous driving, intelligent robots, metaverse and other fields. Besides, loop closure detection (LCD) is an essential component in VSLAM which can correct the drift and accumulated errors caused by the visual odometry (VO) front-end, and assist robot to build a globally consistent map. Over the years, several deep-learning methods have been proposed to address the task. However, the prior proposed neural network-based LCD models are heavy in model size, and difficult to be deployed on edge devices. In this paper, an LCD module based on the tensorized transformer model called TT-LCD is proposed. To obtain a tensorized transformer model with accuracy-complexity co-awareness which can be effectively deployed, we proposed a construction method for tensor compressed transformer model with tensor-train (TT) decomposition and a differential neural network architecture search (NAS) method for tensor rank selection. Experiments demonstrate that the TT-LCD realizes a model size 6.04 × smaller than uncompressed transformer model, 32.1 × smaller than the VGG model and achieves lower memory cost of about 134M on edge CPU with little loss of accuracy on pre-training dataset but even 2.13% higher average accuracy on NewCollege dataset compared with uncompressed DeiT-based model in LCD task.
TT-LCD:基于张拉变压器的机器人视觉SLAM边缘闭环检测
视觉同步定位与地图(VSLAM)是自动驾驶、智能机器人、元宇宙等领域的核心技术之一。此外,闭环检测(LCD)是VSLAM中必不可少的组成部分,它可以纠正视觉里程计(VO)前端引起的漂移和累积误差,并帮助机器人构建全局一致的地图。多年来,已经提出了几种深度学习方法来解决这一任务。然而,先前提出的基于神经网络的LCD模型模型尺寸较大,难以部署在边缘设备上。本文提出了一种基于张拉变压器模型的显示模块TT-LCD。为了获得具有精度-复杂度协同感知的张量变压器模型,提出了一种张量-训练分解的张量压缩变压器模型构建方法和一种张量等级选择的差分神经网络结构搜索方法。实验表明,TT-LCD实现的模型尺寸比未压缩的变压器模型小6.04倍,比VGG模型小32.1倍,在边缘CPU上的内存成本约为134M,在预训练数据集上的精度损失很小,在NewCollege数据集上的平均精度比未压缩的基于deit的模型在LCD任务上的平均精度高出2.13%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信