{"title":"ESVO2:直接视觉惯性里程计与立体事件相机","authors":"Junkai Niu;Sheng Zhong;Xiuyuan Lu;Shaojie Shen;Guillermo Gallego;Yi Zhou","doi":"10.1109/TRO.2025.3548523","DOIUrl":null,"url":null,"abstract":"Event-based visual odometry is a specific branch of visual simultaneous localization and mapping (SLAM) techniques, which aims at solving tracking and mapping subproblems (typically in parallel), by exploiting the special working principles of neuromorphic (i.e., event-based) cameras. Due to the motion-dependent nature of event data, explicit data association (i.e., feature matching) under large-baseline viewpoint changes is difficult to establish, making direct methods a more rational choice. However, state-of-the-art direct methods are limited by the high computational complexity of the mapping subproblem and the degeneracy of camera pose tracking in certain degrees of freedom (DoF) in rotation. In this article, we tackle these issues by building an event-based stereo visual-inertial odometry system, which is built upon a direct pipeline known as event-based stereo visual odometry (ESVO). Specifically, to speed up the mapping operation, we propose an efficient strategy for sampling contour points according to the local dynamics of events. The mapping performance is also improved in terms of structure completeness and local smoothness by merging the temporal stereo and static stereo results. To circumvent the degeneracy of camera pose tracking in recovering the pitch and yaw components of general 6-DoF motion, we introduce IMU measurements as motion priors via preintegration. To this end, a compact back-end is proposed for continuously updating the IMU bias and predicting the linear velocity, enabling an accurate motion prediction for camera pose tracking. The resulting system scales well with modern high-resolution event cameras and leads to better global positioning accuracy in large-scale outdoor environments. Extensive evaluations on five publicly available datasets featuring different resolutions and scenarios justify the superior performance of the proposed system against five state-of-the-art methods. Compared to ESVO, our new pipeline significantly reduces the camera pose tracking error by 40%–80% and 20%–80% in terms of absolute trajectory error and relative pose error, respectively; at the same time, the mapping efficiency is improved by a factor of five. We release our pipeline as an open-source software for future research in this field.","PeriodicalId":50388,"journal":{"name":"IEEE Transactions on Robotics","volume":"41 ","pages":"2164-2183"},"PeriodicalIF":9.4000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ESVO2: Direct Visual-Inertial Odometry With Stereo Event Cameras\",\"authors\":\"Junkai Niu;Sheng Zhong;Xiuyuan Lu;Shaojie Shen;Guillermo Gallego;Yi Zhou\",\"doi\":\"10.1109/TRO.2025.3548523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Event-based visual odometry is a specific branch of visual simultaneous localization and mapping (SLAM) techniques, which aims at solving tracking and mapping subproblems (typically in parallel), by exploiting the special working principles of neuromorphic (i.e., event-based) cameras. Due to the motion-dependent nature of event data, explicit data association (i.e., feature matching) under large-baseline viewpoint changes is difficult to establish, making direct methods a more rational choice. However, state-of-the-art direct methods are limited by the high computational complexity of the mapping subproblem and the degeneracy of camera pose tracking in certain degrees of freedom (DoF) in rotation. In this article, we tackle these issues by building an event-based stereo visual-inertial odometry system, which is built upon a direct pipeline known as event-based stereo visual odometry (ESVO). Specifically, to speed up the mapping operation, we propose an efficient strategy for sampling contour points according to the local dynamics of events. The mapping performance is also improved in terms of structure completeness and local smoothness by merging the temporal stereo and static stereo results. To circumvent the degeneracy of camera pose tracking in recovering the pitch and yaw components of general 6-DoF motion, we introduce IMU measurements as motion priors via preintegration. To this end, a compact back-end is proposed for continuously updating the IMU bias and predicting the linear velocity, enabling an accurate motion prediction for camera pose tracking. The resulting system scales well with modern high-resolution event cameras and leads to better global positioning accuracy in large-scale outdoor environments. Extensive evaluations on five publicly available datasets featuring different resolutions and scenarios justify the superior performance of the proposed system against five state-of-the-art methods. Compared to ESVO, our new pipeline significantly reduces the camera pose tracking error by 40%–80% and 20%–80% in terms of absolute trajectory error and relative pose error, respectively; at the same time, the mapping efficiency is improved by a factor of five. We release our pipeline as an open-source software for future research in this field.\",\"PeriodicalId\":50388,\"journal\":{\"name\":\"IEEE Transactions on Robotics\",\"volume\":\"41 \",\"pages\":\"2164-2183\"},\"PeriodicalIF\":9.4000,\"publicationDate\":\"2025-03-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Robotics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10912788/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Robotics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10912788/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ROBOTICS","Score":null,"Total":0}
ESVO2: Direct Visual-Inertial Odometry With Stereo Event Cameras
Event-based visual odometry is a specific branch of visual simultaneous localization and mapping (SLAM) techniques, which aims at solving tracking and mapping subproblems (typically in parallel), by exploiting the special working principles of neuromorphic (i.e., event-based) cameras. Due to the motion-dependent nature of event data, explicit data association (i.e., feature matching) under large-baseline viewpoint changes is difficult to establish, making direct methods a more rational choice. However, state-of-the-art direct methods are limited by the high computational complexity of the mapping subproblem and the degeneracy of camera pose tracking in certain degrees of freedom (DoF) in rotation. In this article, we tackle these issues by building an event-based stereo visual-inertial odometry system, which is built upon a direct pipeline known as event-based stereo visual odometry (ESVO). Specifically, to speed up the mapping operation, we propose an efficient strategy for sampling contour points according to the local dynamics of events. The mapping performance is also improved in terms of structure completeness and local smoothness by merging the temporal stereo and static stereo results. To circumvent the degeneracy of camera pose tracking in recovering the pitch and yaw components of general 6-DoF motion, we introduce IMU measurements as motion priors via preintegration. To this end, a compact back-end is proposed for continuously updating the IMU bias and predicting the linear velocity, enabling an accurate motion prediction for camera pose tracking. The resulting system scales well with modern high-resolution event cameras and leads to better global positioning accuracy in large-scale outdoor environments. Extensive evaluations on five publicly available datasets featuring different resolutions and scenarios justify the superior performance of the proposed system against five state-of-the-art methods. Compared to ESVO, our new pipeline significantly reduces the camera pose tracking error by 40%–80% and 20%–80% in terms of absolute trajectory error and relative pose error, respectively; at the same time, the mapping efficiency is improved by a factor of five. We release our pipeline as an open-source software for future research in this field.
期刊介绍:
The IEEE Transactions on Robotics (T-RO) is dedicated to publishing fundamental papers covering all facets of robotics, drawing on interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, and beyond. From industrial applications to service and personal assistants, surgical operations to space, underwater, and remote exploration, robots and intelligent machines play pivotal roles across various domains, including entertainment, safety, search and rescue, military applications, agriculture, and intelligent vehicles.
Special emphasis is placed on intelligent machines and systems designed for unstructured environments, where a significant portion of the environment remains unknown and beyond direct sensing or control.