GloFP-MSF：利用全局特征感知进行单目场景流估计

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems Pub Date : 2024-07-30 DOI:10.1007/s00530-024-01418-5

Xuezhi Xiang, Yu Cui, Xi Wang, Mingliang Zhai, Abdulmotaleb El Saddik

{"title":"GloFP-MSF：利用全局特征感知进行单目场景流估计","authors":"Xuezhi Xiang, Yu Cui, Xi Wang, Mingliang Zhai, Abdulmotaleb El Saddik","doi":"10.1007/s00530-024-01418-5","DOIUrl":null,"url":null,"abstract":"<p>Monocular scene flow estimation is a task that allows us to obtain 3D structure and 3D motion from consecutive monocular images. Previous monocular scene flow usually focused on the enhancement of image features and motion features directly while neglecting the utilization of motion features and image features in the decoder, which are equally crucial for accurate scene flow estimation. Based on the cross-covariance attention, we propose a global feature perception module (GFPM) and applie it to the decoder, which enables the decoder to utilize the motion features and image features of the current layer as well as the coarse estimation result of the scene flow of the previous layer effectively, thus enhancing the decoder’s recovery of 3D motion information. In addition, we also propose a parallel architecture of self-attention and convolution (PCSA) for feature extraction, which can enhance the global expression ability of extracted image features. Our proposed method demonstrates remarkable performance on the KITTI 2015 dataset, achieving a relative improvement of 17.6% compared to the baseline approach. Compared to other recent methods, the proposed model achieves competitive results.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"50 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GloFP-MSF: monocular scene flow estimation with global feature perception\",\"authors\":\"Xuezhi Xiang, Yu Cui, Xi Wang, Mingliang Zhai, Abdulmotaleb El Saddik\",\"doi\":\"10.1007/s00530-024-01418-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Monocular scene flow estimation is a task that allows us to obtain 3D structure and 3D motion from consecutive monocular images. Previous monocular scene flow usually focused on the enhancement of image features and motion features directly while neglecting the utilization of motion features and image features in the decoder, which are equally crucial for accurate scene flow estimation. Based on the cross-covariance attention, we propose a global feature perception module (GFPM) and applie it to the decoder, which enables the decoder to utilize the motion features and image features of the current layer as well as the coarse estimation result of the scene flow of the previous layer effectively, thus enhancing the decoder’s recovery of 3D motion information. In addition, we also propose a parallel architecture of self-attention and convolution (PCSA) for feature extraction, which can enhance the global expression ability of extracted image features. Our proposed method demonstrates remarkable performance on the KITTI 2015 dataset, achieving a relative improvement of 17.6% compared to the baseline approach. Compared to other recent methods, the proposed model achieves competitive results.</p>\",\"PeriodicalId\":51138,\"journal\":{\"name\":\"Multimedia Systems\",\"volume\":\"50 1\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2024-07-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Multimedia Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00530-024-01418-5\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01418-5","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

单目场景流估计是一项能从连续的单目图像中获取三维结构和三维运动的任务。以往的单目场景流通常直接关注图像特征和运动特征的增强，而忽视了运动特征和图像特征在解码器中的利用，而运动特征和图像特征对于准确的场景流估计同样至关重要。基于交叉协方差注意，我们提出了全局特征感知模块（GFPM），并将其应用于解码器中，使解码器能够有效利用当前层的运动特征和图像特征以及上一层场景流的粗估计结果，从而提高解码器对三维运动信息的恢复能力。此外，我们还提出了一种用于特征提取的并行自注意和卷积（PCSA）架构，可以增强提取的图像特征的全局表达能力。我们提出的方法在 KITTI 2015 数据集上表现出色，与基线方法相比相对提高了 17.6%。与其他最新方法相比，我们提出的模型取得了具有竞争力的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

GloFP-MSF: monocular scene flow estimation with global feature perception

查看原文本刊更多论文

GloFP-MSF: monocular scene flow estimation with global feature perception

Monocular scene flow estimation is a task that allows us to obtain 3D structure and 3D motion from consecutive monocular images. Previous monocular scene flow usually focused on the enhancement of image features and motion features directly while neglecting the utilization of motion features and image features in the decoder, which are equally crucial for accurate scene flow estimation. Based on the cross-covariance attention, we propose a global feature perception module (GFPM) and applie it to the decoder, which enables the decoder to utilize the motion features and image features of the current layer as well as the coarse estimation result of the scene flow of the previous layer effectively, thus enhancing the decoder’s recovery of 3D motion information. In addition, we also propose a parallel architecture of self-attention and convolution (PCSA) for feature extraction, which can enhance the global expression ability of extracted image features. Our proposed method demonstrates remarkable performance on the KITTI 2015 dataset, achieving a relative improvement of 17.6% compared to the baseline approach. Compared to other recent methods, the proposed model achieves competitive results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Multimedia Systems 工程技术-计算机：理论方法

CiteScore

5.40

自引率

7.70%

发文量

148

审稿时长

4.5 months

期刊介绍： This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.