Xiaofei Zhang, Zhengping Fan, Ying Shen, Yining Li, Yasong An, Xiaojun Tan
{"title":"MAEMOT: Pretrained MAE-Based Antiocclusion 3-D Multiobject Tracking for Autonomous Driving.","authors":"Xiaofei Zhang, Zhengping Fan, Ying Shen, Yining Li, Yasong An, Xiaojun Tan","doi":"10.1109/TNNLS.2024.3480148","DOIUrl":null,"url":null,"abstract":"<p><p>The existing 3-D multiobject tracking (MOT) methods suffer from object occlusion in real-world traffic scenes. However, previous works have faced challenges in providing a reasonable solution to the fundamental question: \"How can the interference of the perception data loss caused by occlusion be overcome?\" Therefore, this article attempts to provide a reasonable solution by developing a novel pretrained movement-constrained masked autoencoder (M-MAE) for an antiocclusion 3-D MOT called MAEMOT. Specifically, for the pretrained M-MAE, this article adopts an efficient multistage transformer (MST) encoder and a spatiotemporal-based motion decoder to predict and reconstruct occluded point cloud data, following the properties of object motion. Afterward, the well-trained M-MAE model extracts the global features of occluded objects, ensuring that the features of the intraobjects between interframes are as consistent as possible throughout the spatiotemporal sequence. Next, a proposal-based geometric graph aggregation (PG <sup>2</sup> A) module is utilized to extract and fuse the spatial features of each proposal, producing refined region-of-interest (RoI) components. Finally, this article designs an object association module that combines geometric and corner affinities, which helps to match the predicted occlusion objects more robustly. According to an extensive evaluation, the proposed MAEMOT method can effectively overcome the interference of occlusion and achieve improved 3-D MOT performance under challenging conditions.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":null,"pages":null},"PeriodicalIF":10.2000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TNNLS.2024.3480148","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The existing 3-D multiobject tracking (MOT) methods suffer from object occlusion in real-world traffic scenes. However, previous works have faced challenges in providing a reasonable solution to the fundamental question: "How can the interference of the perception data loss caused by occlusion be overcome?" Therefore, this article attempts to provide a reasonable solution by developing a novel pretrained movement-constrained masked autoencoder (M-MAE) for an antiocclusion 3-D MOT called MAEMOT. Specifically, for the pretrained M-MAE, this article adopts an efficient multistage transformer (MST) encoder and a spatiotemporal-based motion decoder to predict and reconstruct occluded point cloud data, following the properties of object motion. Afterward, the well-trained M-MAE model extracts the global features of occluded objects, ensuring that the features of the intraobjects between interframes are as consistent as possible throughout the spatiotemporal sequence. Next, a proposal-based geometric graph aggregation (PG 2 A) module is utilized to extract and fuse the spatial features of each proposal, producing refined region-of-interest (RoI) components. Finally, this article designs an object association module that combines geometric and corner affinities, which helps to match the predicted occlusion objects more robustly. According to an extensive evaluation, the proposed MAEMOT method can effectively overcome the interference of occlusion and achieve improved 3-D MOT performance under challenging conditions.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.