基于自监督单目深度估计的潜在目标嵌入

IF 5.3 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Shuai Wang;Ting Yu;Shan Pan;Wei Chen;Zehua Wang;Victor C. M. Leung;Zijian Tian
{"title":"基于自监督单目深度估计的潜在目标嵌入","authors":"Shuai Wang;Ting Yu;Shan Pan;Wei Chen;Zehua Wang;Victor C. M. Leung;Zijian Tian","doi":"10.1109/TETCI.2025.3547851","DOIUrl":null,"url":null,"abstract":"Extracting 3D information from 2D images is highly significant, and self-supervised monocular depth estimation has demonstrated great potential in this field. However, existing methods primarily focus on estimating depth from immediate visual features, leading to severe foreground-background adhesion, which poses challenges for achieving precise depth estimation. In this paper, we propose a depth estimation method called LOEDepth, which can implicitly distinguish foreground objects from the background. In LOEDepth, a latent object embedding module is introduced, which leverages a set of learnable queries to generate latent object proposals from both immediate visual features extracted by the encoder and sparse object features derived through multi-scale deformable attention. These latent object proposals are utilized to perform soft classification on the decoded features to distinguish foreground objects from the background. Additionally, as depth boundaries do not always align with semantic boundaries, we propose a novel deep decoder to provide decoding features with rich spatial location retrieval and semantic information. Finally, two mask strategies are utilized to conceal pixels violating the scene's static assumption, so as to mitigate disruptions caused by abnormal pixels during self-supervised training. Experimental results on the KITTI and Make3D datasets demonstrate significant performance improvements and robust fine-grained scene depth estimation capabilities of the proposed method.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 5","pages":"3548-3559"},"PeriodicalIF":5.3000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Latent Object Embedding for Self-Supervised Monocular Depth Estimation\",\"authors\":\"Shuai Wang;Ting Yu;Shan Pan;Wei Chen;Zehua Wang;Victor C. M. Leung;Zijian Tian\",\"doi\":\"10.1109/TETCI.2025.3547851\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Extracting 3D information from 2D images is highly significant, and self-supervised monocular depth estimation has demonstrated great potential in this field. However, existing methods primarily focus on estimating depth from immediate visual features, leading to severe foreground-background adhesion, which poses challenges for achieving precise depth estimation. In this paper, we propose a depth estimation method called LOEDepth, which can implicitly distinguish foreground objects from the background. In LOEDepth, a latent object embedding module is introduced, which leverages a set of learnable queries to generate latent object proposals from both immediate visual features extracted by the encoder and sparse object features derived through multi-scale deformable attention. These latent object proposals are utilized to perform soft classification on the decoded features to distinguish foreground objects from the background. Additionally, as depth boundaries do not always align with semantic boundaries, we propose a novel deep decoder to provide decoding features with rich spatial location retrieval and semantic information. Finally, two mask strategies are utilized to conceal pixels violating the scene's static assumption, so as to mitigate disruptions caused by abnormal pixels during self-supervised training. Experimental results on the KITTI and Make3D datasets demonstrate significant performance improvements and robust fine-grained scene depth estimation capabilities of the proposed method.\",\"PeriodicalId\":13135,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computational Intelligence\",\"volume\":\"9 5\",\"pages\":\"3548-3559\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10930815/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10930815/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

从二维图像中提取三维信息具有重要意义,而自监督单目深度估计在该领域显示出巨大的潜力。然而,现有的方法主要集中在从直接的视觉特征中估计深度,导致严重的前景-背景粘附,这给实现精确的深度估计带来了挑战。在本文中,我们提出了一种称为LOEDepth的深度估计方法,该方法可以隐式区分前景目标和背景目标。在LOEDepth中,引入了潜在目标嵌入模块,该模块利用一组可学习的查询,从编码器提取的即时视觉特征和通过多尺度可变形注意派生的稀疏目标特征中生成潜在目标建议。利用这些潜在目标建议对解码后的特征进行软分类,以区分前景目标和背景目标。此外,由于深度边界并不总是与语义边界一致,我们提出了一种新的深度解码器,以提供具有丰富空间位置检索和语义信息的解码特征。最后,利用两种掩模策略来隐藏违反场景静态假设的像素,以减轻自监督训练过程中异常像素所造成的干扰。在KITTI和Make3D数据集上的实验结果表明,该方法具有显著的性能改进和鲁棒的细粒度场景深度估计能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Latent Object Embedding for Self-Supervised Monocular Depth Estimation
Extracting 3D information from 2D images is highly significant, and self-supervised monocular depth estimation has demonstrated great potential in this field. However, existing methods primarily focus on estimating depth from immediate visual features, leading to severe foreground-background adhesion, which poses challenges for achieving precise depth estimation. In this paper, we propose a depth estimation method called LOEDepth, which can implicitly distinguish foreground objects from the background. In LOEDepth, a latent object embedding module is introduced, which leverages a set of learnable queries to generate latent object proposals from both immediate visual features extracted by the encoder and sparse object features derived through multi-scale deformable attention. These latent object proposals are utilized to perform soft classification on the decoded features to distinguish foreground objects from the background. Additionally, as depth boundaries do not always align with semantic boundaries, we propose a novel deep decoder to provide decoding features with rich spatial location retrieval and semantic information. Finally, two mask strategies are utilized to conceal pixels violating the scene's static assumption, so as to mitigate disruptions caused by abnormal pixels during self-supervised training. Experimental results on the KITTI and Make3D datasets demonstrate significant performance improvements and robust fine-grained scene depth estimation capabilities of the proposed method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
10.30
自引率
7.50%
发文量
147
期刊介绍: The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信