Deep learning enhanced monocular visual odometry: Advancements in fusion mechanisms and training strategies

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-09-11 DOI:10.1016/j.imavis.2025.105732

E. Simsek , B. Ozyer

{"title":"Deep learning enhanced monocular visual odometry: Advancements in fusion mechanisms and training strategies","authors":"E. Simsek , B. Ozyer","doi":"10.1016/j.imavis.2025.105732","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advances in deep learning have revolutionized robotic applications such as 3D mapping, visual navigation and autonomous control. Monocular Visual Odometry (MVO) represents a critical advancement in autonomous systems, particularly drones, utilizing single-camera setups to navigate complex environments effectively. This review explores MVO’s evolution from traditional methods to its integration with cutting-edge technologies like deep learning and semantic understanding. In this study, we explore the latest training strategies, innovations in model architecture, and advanced fusion techniques used in hybrid models that combine depth and semantic information. A comprehensive literature review traces the evolution of MVO techniques, highlighting key datasets and performance metrics. Section 2 outlines the problem, while Section 3 reviews the studies, charting the evolution of MVO techniques predating the advent of deep learning. Section 4 details the methodology, focusing on cutting-edge training strategies, advancements in architectural designs, and fusion techniques in hybrid models integrating depth and semantic information. Finally, Section 5 summarizes findings, discusses implications, and suggests future research directions.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105732"},"PeriodicalIF":4.2000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625003208","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advances in deep learning have revolutionized robotic applications such as 3D mapping, visual navigation and autonomous control. Monocular Visual Odometry (MVO) represents a critical advancement in autonomous systems, particularly drones, utilizing single-camera setups to navigate complex environments effectively. This review explores MVO’s evolution from traditional methods to its integration with cutting-edge technologies like deep learning and semantic understanding. In this study, we explore the latest training strategies, innovations in model architecture, and advanced fusion techniques used in hybrid models that combine depth and semantic information. A comprehensive literature review traces the evolution of MVO techniques, highlighting key datasets and performance metrics. Section 2 outlines the problem, while Section 3 reviews the studies, charting the evolution of MVO techniques predating the advent of deep learning. Section 4 details the methodology, focusing on cutting-edge training strategies, advancements in architectural designs, and fusion techniques in hybrid models integrating depth and semantic information. Finally, Section 5 summarizes findings, discusses implications, and suggests future research directions.

查看原文本刊更多论文

深度学习增强单目视觉里程计：融合机制和训练策略的进展

深度学习的最新进展彻底改变了机器人应用，如3D地图、视觉导航和自主控制。单目视觉里程计（MVO）代表了自主系统的关键进步，特别是无人机，利用单摄像头设置有效地导航复杂的环境。本文探讨了MVO从传统方法到与深度学习和语义理解等前沿技术相结合的演变。在这项研究中，我们探索了最新的训练策略，模型架构的创新，以及结合深度和语义信息的混合模型中使用的先进融合技术。一篇全面的文献综述追溯了MVO技术的发展，突出了关键数据集和性能指标。第2节概述了问题，而第3节回顾了研究，绘制了深度学习出现之前MVO技术的演变。第4节详细介绍了方法，重点介绍了尖端的训练策略、建筑设计的进展以及集成深度和语义信息的混合模型中的融合技术。最后，第5部分总结了研究结果，讨论了影响，并提出了未来的研究方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.