OAFuser: Toward Omni-Aperture Fusion for Light Field Semantic Segmentation

IEEE transactions on artificial intelligence Pub Date : 2024-09-11 DOI:10.1109/TAI.2024.3457931

Fei Teng;Jiaming Zhang;Kunyu Peng;Yaonan Wang;Rainer Stiefelhagen;Kailun Yang

{"title":"OAFuser: Toward Omni-Aperture Fusion for Light Field Semantic Segmentation","authors":"Fei Teng;Jiaming Zhang;Kunyu Peng;Yaonan Wang;Rainer Stiefelhagen;Kailun Yang","doi":"10.1109/TAI.2024.3457931","DOIUrl":null,"url":null,"abstract":"Light field cameras are capable of capturing intricate angular and spatial details. This allows for acquiring complex light patterns and details from multiple angles, significantly enhancing the precision of image semantic segmentation. However, two significant issues arise: 1) The extensive angular information of light field cameras contains a large amount of redundant data, which is overwhelming for the limited hardware resources of intelligent agents. 2) A relative displacement difference exists in the data collected by different microlenses. To address these issues, we propose an \n<italic>omni-aperture fusion model (OAFuser)</i>\n that leverages dense context from the central view and extracts the angular information from subaperture images to generate semantically consistent results. To simultaneously streamline the redundant information from the light field cameras and avoid feature loss during network propagation, we present a simple yet very effective \n<italic>subaperture fusion module (SAFM)</i>\n. This module efficiently embeds subaperture images in angular features, allowing the network to process each subaperture image with a minimal computational demand of only (\n<inline-formula><tex-math>${\\sim}1\\rm GFlops$</tex-math></inline-formula>\n). Furthermore, to address the mismatched spatial information across viewpoints, we present a \n<italic>center angular rectification module (CARM)</i>\n to realize feature resorting and prevent feature occlusion caused by misalignment. The proposed OAFuser achieves state-of-the-art performance on four UrbanLF datasets in terms of \n<italic>all evaluation metrics</i>\n and sets a new record of \n<inline-formula><tex-math>$84.93\\%$</tex-math></inline-formula>\n in mIoU on the UrbanLF-Real Extended dataset, with a gain of \n<inline-formula><tex-math>${+}3.69\\%$</tex-math></inline-formula>\n. The source code for OAFuser is available at \n<uri>https://github.com/FeiBryantkit/OAFuser</uri>\n.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6225-6239"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10677512/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Light field cameras are capable of capturing intricate angular and spatial details. This allows for acquiring complex light patterns and details from multiple angles, significantly enhancing the precision of image semantic segmentation. However, two significant issues arise: 1) The extensive angular information of light field cameras contains a large amount of redundant data, which is overwhelming for the limited hardware resources of intelligent agents. 2) A relative displacement difference exists in the data collected by different microlenses. To address these issues, we propose an omni-aperture fusion model (OAFuser) that leverages dense context from the central view and extracts the angular information from subaperture images to generate semantically consistent results. To simultaneously streamline the redundant information from the light field cameras and avoid feature loss during network propagation, we present a simple yet very effective subaperture fusion module (SAFM) . This module efficiently embeds subaperture images in angular features, allowing the network to process each subaperture image with a minimal computational demand of only (

${\sim}1\rm GFlops$

). Furthermore, to address the mismatched spatial information across viewpoints, we present a center angular rectification module (CARM) to realize feature resorting and prevent feature occlusion caused by misalignment. The proposed OAFuser achieves state-of-the-art performance on four UrbanLF datasets in terms of all evaluation metrics and sets a new record of

$84.93\%$

in mIoU on the UrbanLF-Real Extended dataset, with a gain of

${+}3.69\%$

. The source code for OAFuser is available at https://github.com/FeiBryantkit/OAFuser .

查看原文本刊更多论文

面向全孔径融合的光场语义分割

光场相机能够捕捉到复杂的角度和空间细节。这允许从多个角度获取复杂的光模式和细节，大大提高了图像语义分割的精度。但是存在两个重要的问题：1)光场相机广泛的角度信息包含了大量的冗余数据，这对于智能体有限的硬件资源来说是压倒性的。2)不同微透镜采集的数据存在相对位移差异。为了解决这些问题，我们提出了一种全孔径融合模型（OAFuser），该模型利用中心视图的密集上下文并从子孔径图像中提取角度信息以生成语义一致的结果。为了同时精简来自光场相机的冗余信息并避免网络传播过程中的特征丢失，我们提出了一种简单但非常有效的子孔径融合模块（SAFM）。该模块有效地将子孔径图像嵌入到角度特征中，使网络能够以最小的计算需求（${\sim}1\rm GFlops$）处理每个子孔径图像。此外，为了解决视点间空间信息不匹配的问题，我们提出了圆心角校正模块（CARM），实现特征求助，防止因不对准导致的特征遮挡。根据所有评估指标，提出的OAFuser在四个UrbanLF数据集上实现了最先进的性能，并在UrbanLF- real扩展数据集上创造了84.93\%$的mIoU新记录，增益为${+}3.69\%$。OAFuser的源代码可从https://github.com/FeiBryantkit/OAFuser获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on artificial intelligence

CiteScore

7.70

自引率

0.00%

发文量