基于频率和畸变感知卷积的全景深度和语义估计

IF 2.2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Image Processing Pub Date : 2025-09-02 DOI:10.1049/ipr2.70197

Bruno Berenguel-Baeta, Jesus Bermudez-Cameo, Jose J. Guerrero

{"title":"基于频率和畸变感知卷积的全景深度和语义估计","authors":"Bruno Berenguel-Baeta, Jesus Bermudez-Cameo, Jose J. Guerrero","doi":"10.1049/ipr2.70197","DOIUrl":null,"url":null,"abstract":"<p>Omnidirectional images reveal advantages when addressing the understanding of the environment due to the 360-degree contextual information. However, the inherent characteristics of the omnidirectional images add additional problems to obtain an accurate detection and segmentation of objects or a good depth estimation. To overcome these problems, we exploit convolutions in the frequency domain, obtaining a wider receptive field in each convolutional layer, and convolutions in the equirectangular projection, to cope with the image distortion. Both convolutions allow to leverage the whole context information from omnidirectional images. Our experiments show that our proposal has better performance on non-gravity-oriented panoramas than state-of-the-art methods and similar performance on oriented panoramas as specific state-of-the-art methods for semantic segmentation and for monocular depth estimation, outperforming the sole other method which provides both tasks.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70197","citationCount":"0","resultStr":"{\"title\":\"Panoramic Depth and Semantic Estimation With Frequency and Distortion Aware Convolutions\",\"authors\":\"Bruno Berenguel-Baeta, Jesus Bermudez-Cameo, Jose J. Guerrero\",\"doi\":\"10.1049/ipr2.70197\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Omnidirectional images reveal advantages when addressing the understanding of the environment due to the 360-degree contextual information. However, the inherent characteristics of the omnidirectional images add additional problems to obtain an accurate detection and segmentation of objects or a good depth estimation. To overcome these problems, we exploit convolutions in the frequency domain, obtaining a wider receptive field in each convolutional layer, and convolutions in the equirectangular projection, to cope with the image distortion. Both convolutions allow to leverage the whole context information from omnidirectional images. Our experiments show that our proposal has better performance on non-gravity-oriented panoramas than state-of-the-art methods and similar performance on oriented panoramas as specific state-of-the-art methods for semantic segmentation and for monocular depth estimation, outperforming the sole other method which provides both tasks.</p>\",\"PeriodicalId\":56303,\"journal\":{\"name\":\"IET Image Processing\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70197\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Image Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.70197\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Image Processing","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.70197","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

由于360度的上下文信息，全方位图像在解决对环境的理解时显示出优势。然而，全向图像的固有特性给获得准确的目标检测和分割或良好的深度估计带来了额外的问题。为了克服这些问题，我们利用频域的卷积，在每个卷积层中获得更宽的接受场，并在等矩形投影中进行卷积，以应对图像失真。这两种卷积都允许利用来自全向图像的整个上下文信息。我们的实验表明，我们的提议在非重力定向全景图上的性能优于最先进的方法，在定向全景图上的性能与语义分割和单目深度估计的特定最先进方法相似，优于提供这两项任务的唯一其他方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Panoramic Depth and Semantic Estimation With Frequency and Distortion Aware Convolutions

查看原文本刊更多论文

Panoramic Depth and Semantic Estimation With Frequency and Distortion Aware Convolutions

Omnidirectional images reveal advantages when addressing the understanding of the environment due to the 360-degree contextual information. However, the inherent characteristics of the omnidirectional images add additional problems to obtain an accurate detection and segmentation of objects or a good depth estimation. To overcome these problems, we exploit convolutions in the frequency domain, obtaining a wider receptive field in each convolutional layer, and convolutions in the equirectangular projection, to cope with the image distortion. Both convolutions allow to leverage the whole context information from omnidirectional images. Our experiments show that our proposal has better performance on non-gravity-oriented panoramas than state-of-the-art methods and similar performance on oriented panoramas as specific state-of-the-art methods for semantic segmentation and for monocular depth estimation, outperforming the sole other method which provides both tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IET Image Processing 工程技术-工程：电子与电气

CiteScore

5.40

自引率

8.70%

发文量

282

审稿时长

6 months

期刊介绍： The IET Image Processing journal encompasses research areas related to the generation, processing and communication of visual information. The focus of the journal is the coverage of the latest research results in image and video processing, including image generation and display, enhancement and restoration, segmentation, colour and texture analysis, coding and communication, implementations and architectures as well as innovative applications. Principal topics include: Generation and Display - Imaging sensors and acquisition systems, illumination, sampling and scanning, quantization, colour reproduction, image rendering, display and printing systems, evaluation of image quality. Processing and Analysis - Image enhancement, restoration, segmentation, registration, multispectral, colour and texture processing, multiresolution processing and wavelets, morphological operations, stereoscopic and 3-D processing, motion detection and estimation, video and image sequence processing. Implementations and Architectures - Image and video processing hardware and software, design and construction, architectures and software, neural, adaptive, and fuzzy processing. Coding and Transmission - Image and video compression and coding, compression standards, noise modelling, visual information networks, streamed video. Retrieval and Multimedia - Storage of images and video, database design, image retrieval, video annotation and editing, mixed media incorporating visual information, multimedia systems and applications, image and video watermarking, steganography. Applications - Innovative application of image and video processing technologies to any field, including life sciences, earth sciences, astronomy, document processing and security. Current Special Issue Call for Papers: Evolutionary Computation for Image Processing - https://digital-library.theiet.org/files/IET_IPR_CFP_EC.pdf AI-Powered 3D Vision - https://digital-library.theiet.org/files/IET_IPR_CFP_AIPV.pdf Multidisciplinary advancement of Imaging Technologies: From Medical Diagnostics and Genomics to Cognitive Machine Vision, and Artificial Intelligence - https://digital-library.theiet.org/files/IET_IPR_CFP_IST.pdf Deep Learning for 3D Reconstruction - https://digital-library.theiet.org/files/IET_IPR_CFP_DLR.pdf