TT-LCD: Tensorized-Transformer based Loop Closure Detection for Robotic Visual SLAM on Edge

2023 International Conference on Advanced Robotics and Mechatronics (ICARM) Pub Date : 2023-07-08 DOI:10.1109/ICARM58088.2023.10218828

Chenchen Ding, Hongwei Ren, Zhiru Guo, Minjie Bi, Changhai Man, Tingting Wang, Shuwei Li, Shaobo Luo, Rumin Zhang, Hao Yu

{"title":"TT-LCD: Tensorized-Transformer based Loop Closure Detection for Robotic Visual SLAM on Edge","authors":"Chenchen Ding, Hongwei Ren, Zhiru Guo, Minjie Bi, Changhai Man, Tingting Wang, Shuwei Li, Shaobo Luo, Rumin Zhang, Hao Yu","doi":"10.1109/ICARM58088.2023.10218828","DOIUrl":null,"url":null,"abstract":"Visual simultaneous localization and mapping (VSLAM) is one of the core technologies in autonomous driving, intelligent robots, metaverse and other fields. Besides, loop closure detection (LCD) is an essential component in VSLAM which can correct the drift and accumulated errors caused by the visual odometry (VO) front-end, and assist robot to build a globally consistent map. Over the years, several deep-learning methods have been proposed to address the task. However, the prior proposed neural network-based LCD models are heavy in model size, and difficult to be deployed on edge devices. In this paper, an LCD module based on the tensorized transformer model called TT-LCD is proposed. To obtain a tensorized transformer model with accuracy-complexity co-awareness which can be effectively deployed, we proposed a construction method for tensor compressed transformer model with tensor-train (TT) decomposition and a differential neural network architecture search (NAS) method for tensor rank selection. Experiments demonstrate that the TT-LCD realizes a model size 6.04 × smaller than uncompressed transformer model, 32.1 × smaller than the VGG model and achieves lower memory cost of about 134M on edge CPU with little loss of accuracy on pre-training dataset but even 2.13% higher average accuracy on NewCollege dataset compared with uncompressed DeiT-based model in LCD task.","PeriodicalId":220013,"journal":{"name":"2023 International Conference on Advanced Robotics and Mechatronics (ICARM)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Advanced Robotics and Mechatronics (ICARM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICARM58088.2023.10218828","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Visual simultaneous localization and mapping (VSLAM) is one of the core technologies in autonomous driving, intelligent robots, metaverse and other fields. Besides, loop closure detection (LCD) is an essential component in VSLAM which can correct the drift and accumulated errors caused by the visual odometry (VO) front-end, and assist robot to build a globally consistent map. Over the years, several deep-learning methods have been proposed to address the task. However, the prior proposed neural network-based LCD models are heavy in model size, and difficult to be deployed on edge devices. In this paper, an LCD module based on the tensorized transformer model called TT-LCD is proposed. To obtain a tensorized transformer model with accuracy-complexity co-awareness which can be effectively deployed, we proposed a construction method for tensor compressed transformer model with tensor-train (TT) decomposition and a differential neural network architecture search (NAS) method for tensor rank selection. Experiments demonstrate that the TT-LCD realizes a model size 6.04 × smaller than uncompressed transformer model, 32.1 × smaller than the VGG model and achieves lower memory cost of about 134M on edge CPU with little loss of accuracy on pre-training dataset but even 2.13% higher average accuracy on NewCollege dataset compared with uncompressed DeiT-based model in LCD task.

查看原文本刊更多论文

TT-LCD:基于张拉变压器的机器人视觉SLAM边缘闭环检测

视觉同步定位与地图(VSLAM)是自动驾驶、智能机器人、元宇宙等领域的核心技术之一。此外，闭环检测(LCD)是VSLAM中必不可少的组成部分，它可以纠正视觉里程计(VO)前端引起的漂移和累积误差，并帮助机器人构建全局一致的地图。多年来，已经提出了几种深度学习方法来解决这一任务。然而，先前提出的基于神经网络的LCD模型模型尺寸较大，难以部署在边缘设备上。本文提出了一种基于张拉变压器模型的显示模块TT-LCD。为了获得具有精度-复杂度协同感知的张量变压器模型，提出了一种张量-训练分解的张量压缩变压器模型构建方法和一种张量等级选择的差分神经网络结构搜索方法。实验表明，TT-LCD实现的模型尺寸比未压缩的变压器模型小6.04倍，比VGG模型小32.1倍，在边缘CPU上的内存成本约为134M，在预训练数据集上的精度损失很小，在NewCollege数据集上的平均精度比未压缩的基于deit的模型在LCD任务上的平均精度高出2.13%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 International Conference on Advanced Robotics and Mechatronics (ICARM)

自引率

0.00%

发文量