Deep Learning for Light Field Saliency Detection

2019 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2019-10-01 DOI:10.1109/ICCV.2019.00893

Tiantian Wang, Yongri Piao, Huchuan Lu, Xiao Li, Lihe Zhang

{"title":"Deep Learning for Light Field Saliency Detection","authors":"Tiantian Wang, Yongri Piao, Huchuan Lu, Xiao Li, Lihe Zhang","doi":"10.1109/ICCV.2019.00893","DOIUrl":null,"url":null,"abstract":"Recent research in 4D saliency detection is limited by the deficiency of a large-scale 4D light field dataset. To address this, we introduce a new dataset to assist the subsequent research in 4D light field saliency detection. To the best of our knowledge, this is to date the largest light field dataset in which the dataset provides 1465 all-focus images with human-labeled ground truth masks and the corresponding focal stacks for every light field image. To verify the effectiveness of the light field data, we first introduce a fusion framework which includes two CNN streams where the focal stacks and all-focus images serve as the input. The focal stack stream utilizes a recurrent attention mechanism to adaptively learn to integrate every slice in the focal stack, which benefits from the extracted features of the good slices. Then it is incorporated with the output map generated by the all-focus stream to make the saliency prediction. In addition, we introduce adversarial examples by adding noise intentionally into images to help train the deep network, which can improve the robustness of the proposed network. The noise is designed by users, which is imperceptible but can fool the CNNs to make the wrong prediction. Extensive experiments show the effectiveness and superiority of the proposed model on the popular evaluation metrics. The proposed method performs favorably compared with the existing 2D, 3D and 4D saliency detection methods on the proposed dataset and existing LFSD light field dataset. The code and results can be found at https://github.com/OIPLab-DUT/ ICCV2019_Deeplightfield_Saliency. Moreover, to facilitate research in this field, all images we collected are shared in a ready-to-use manner.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"22 1","pages":"8837-8847"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"79","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2019.00893","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 79

Abstract

Recent research in 4D saliency detection is limited by the deficiency of a large-scale 4D light field dataset. To address this, we introduce a new dataset to assist the subsequent research in 4D light field saliency detection. To the best of our knowledge, this is to date the largest light field dataset in which the dataset provides 1465 all-focus images with human-labeled ground truth masks and the corresponding focal stacks for every light field image. To verify the effectiveness of the light field data, we first introduce a fusion framework which includes two CNN streams where the focal stacks and all-focus images serve as the input. The focal stack stream utilizes a recurrent attention mechanism to adaptively learn to integrate every slice in the focal stack, which benefits from the extracted features of the good slices. Then it is incorporated with the output map generated by the all-focus stream to make the saliency prediction. In addition, we introduce adversarial examples by adding noise intentionally into images to help train the deep network, which can improve the robustness of the proposed network. The noise is designed by users, which is imperceptible but can fool the CNNs to make the wrong prediction. Extensive experiments show the effectiveness and superiority of the proposed model on the popular evaluation metrics. The proposed method performs favorably compared with the existing 2D, 3D and 4D saliency detection methods on the proposed dataset and existing LFSD light field dataset. The code and results can be found at https://github.com/OIPLab-DUT/ ICCV2019_Deeplightfield_Saliency. Moreover, to facilitate research in this field, all images we collected are shared in a ready-to-use manner.

查看原文本刊更多论文

光场显著性检测的深度学习

由于缺乏大规模的四维光场数据集，目前在四维显著性检测方面的研究受到了限制。为了解决这个问题，我们引入了一个新的数据集来辅助后续的4D光场显著性检测研究。据我们所知，这是迄今为止最大的光场数据集，其中数据集提供了1465张全聚焦图像，其中包含人工标记的地面真相掩模和每个光场图像的相应焦点堆栈。为了验证光场数据的有效性，我们首先引入了一个融合框架，该框架包括两个CNN流，其中焦点堆栈和全聚焦图像作为输入。焦点叠流利用循环注意机制自适应学习整合焦点叠中的每个切片，这得益于提取好的切片的特征。然后结合全焦点流生成的输出图进行显著性预测。此外，我们通过有意地在图像中添加噪声来引入对抗示例，以帮助训练深度网络，这可以提高所提出网络的鲁棒性。噪声是由用户设计的，它是难以察觉的，但可以欺骗cnn做出错误的预测。大量的实验证明了该模型在常用评价指标上的有效性和优越性。与现有的二维、三维和四维显著性检测方法相比，本文方法在本文数据集和现有的LFSD光场数据集上表现良好。代码和结果可以在https://github.com/OIPLab-DUT/ ICCV2019_Deeplightfield_Saliency上找到。此外，为了促进这一领域的研究，我们收集的所有图像都以现成的方式共享。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

自引率

0.00%

发文量