Juncheng ZHANG , Fuyang KE , Qinqin TANG , Wenming YU , Ming ZHANG
{"title":"YGC-SLAM:A visual SLAM based on improved YOLOv5 and geometric constraints for dynamic indoor environments","authors":"Juncheng ZHANG , Fuyang KE , Qinqin TANG , Wenming YU , Ming ZHANG","doi":"10.1016/j.vrih.2024.05.001","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>As visual simultaneous localization and mapping (SLAM) is primarily based on the assumption of a static scene, the presence of dynamic objects in the frame causes problems such as a deterioration of system robustness and inaccurate position estimation. In this study, we propose a YGC-SLAM for indoor dynamic environments based on the ORB-SLAM2 framework combined with semantic and geometric constraints to improve the positioning accuracy and robustness of the system.</div></div><div><h3>Methods</h3><div>First, the recognition accuracy of YOLOv5 was improved by introducing the convolution block attention model and the improved EIOU loss function, whereby the prediction frame converges quickly for better detection. The improved YOLOv5 was then added to the tracking thread for dynamic target detection to eliminate dynamic points. Subsequently, multi-view geometric constraints were used for re-judging to further eliminate dynamic points while enabling more useful feature points to be retained and preventing the semantic approach from over-eliminating feature points, causing a failure of map building. The K-means clustering algorithm was used to accelerate this process and quickly calculate and determine the motion state of each cluster of pixel points. Finally, a strategy for drawing keyframes with de-redundancy was implemented to construct a clear 3D dense static point-cloud map.</div></div><div><h3>Results</h3><div>Through testing on TUM dataset and a real environment, the experimental results show that our algorithm reduces the absolute trajectory error by 98.22% and the relative trajectory error by 97.98% compared with the original ORB-SLAM2, which is more accurate and has better real-time performance than similar algorithms, such as DynaSLAM and DS-SLAM.</div></div><div><h3>Conclusions</h3><div>The YGC-SLAM proposed in this study can effectively eliminate the adverse effects of dynamic objects, and the system can better complete positioning and map building tasks in complex environments.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 1","pages":"Pages 62-82"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Virtual Reality Intelligent Hardware","FirstCategoryId":"1093","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2096579624000214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
Abstract
Background
As visual simultaneous localization and mapping (SLAM) is primarily based on the assumption of a static scene, the presence of dynamic objects in the frame causes problems such as a deterioration of system robustness and inaccurate position estimation. In this study, we propose a YGC-SLAM for indoor dynamic environments based on the ORB-SLAM2 framework combined with semantic and geometric constraints to improve the positioning accuracy and robustness of the system.
Methods
First, the recognition accuracy of YOLOv5 was improved by introducing the convolution block attention model and the improved EIOU loss function, whereby the prediction frame converges quickly for better detection. The improved YOLOv5 was then added to the tracking thread for dynamic target detection to eliminate dynamic points. Subsequently, multi-view geometric constraints were used for re-judging to further eliminate dynamic points while enabling more useful feature points to be retained and preventing the semantic approach from over-eliminating feature points, causing a failure of map building. The K-means clustering algorithm was used to accelerate this process and quickly calculate and determine the motion state of each cluster of pixel points. Finally, a strategy for drawing keyframes with de-redundancy was implemented to construct a clear 3D dense static point-cloud map.
Results
Through testing on TUM dataset and a real environment, the experimental results show that our algorithm reduces the absolute trajectory error by 98.22% and the relative trajectory error by 97.98% compared with the original ORB-SLAM2, which is more accurate and has better real-time performance than similar algorithms, such as DynaSLAM and DS-SLAM.
Conclusions
The YGC-SLAM proposed in this study can effectively eliminate the adverse effects of dynamic objects, and the system can better complete positioning and map building tasks in complex environments.