Xuezhi Xiang, Yu Cui, Xi Wang, Mingliang Zhai, Abdulmotaleb El Saddik
{"title":"GloFP-MSF: monocular scene flow estimation with global feature perception","authors":"Xuezhi Xiang, Yu Cui, Xi Wang, Mingliang Zhai, Abdulmotaleb El Saddik","doi":"10.1007/s00530-024-01418-5","DOIUrl":null,"url":null,"abstract":"<p>Monocular scene flow estimation is a task that allows us to obtain 3D structure and 3D motion from consecutive monocular images. Previous monocular scene flow usually focused on the enhancement of image features and motion features directly while neglecting the utilization of motion features and image features in the decoder, which are equally crucial for accurate scene flow estimation. Based on the cross-covariance attention, we propose a global feature perception module (GFPM) and applie it to the decoder, which enables the decoder to utilize the motion features and image features of the current layer as well as the coarse estimation result of the scene flow of the previous layer effectively, thus enhancing the decoder’s recovery of 3D motion information. In addition, we also propose a parallel architecture of self-attention and convolution (PCSA) for feature extraction, which can enhance the global expression ability of extracted image features. Our proposed method demonstrates remarkable performance on the KITTI 2015 dataset, achieving a relative improvement of 17.6% compared to the baseline approach. Compared to other recent methods, the proposed model achieves competitive results.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"50 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01418-5","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Monocular scene flow estimation is a task that allows us to obtain 3D structure and 3D motion from consecutive monocular images. Previous monocular scene flow usually focused on the enhancement of image features and motion features directly while neglecting the utilization of motion features and image features in the decoder, which are equally crucial for accurate scene flow estimation. Based on the cross-covariance attention, we propose a global feature perception module (GFPM) and applie it to the decoder, which enables the decoder to utilize the motion features and image features of the current layer as well as the coarse estimation result of the scene flow of the previous layer effectively, thus enhancing the decoder’s recovery of 3D motion information. In addition, we also propose a parallel architecture of self-attention and convolution (PCSA) for feature extraction, which can enhance the global expression ability of extracted image features. Our proposed method demonstrates remarkable performance on the KITTI 2015 dataset, achieving a relative improvement of 17.6% compared to the baseline approach. Compared to other recent methods, the proposed model achieves competitive results.
期刊介绍:
This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.