基于变压器注意力的时间聚合多不连续图像显著性预测

2022 International Conference on Robotics and Automation (ICRA) Pub Date : 2022-05-23 DOI:10.1109/icra46639.2022.9811544

Pin-Jie Huang, Chi-An Lu, Kuan-Wen Chen

{"title":"基于变压器注意力的时间聚合多不连续图像显著性预测","authors":"Pin-Jie Huang, Chi-An Lu, Kuan-Wen Chen","doi":"10.1109/icra46639.2022.9811544","DOIUrl":null,"url":null,"abstract":"In this paper, we aim to apply deep saliency prediction to automatic drone exploration, which should consider not only one single image, but multiple images from different view angles or localizations in order to determine the exploration direction. However, little attention has been paid to such saliency prediction problem over multiple-discontinuous-image and none of existing methods take temporal information into consideration, which may mean that the current predicted saliency map is not consistent with the previous predicted results. For this purpose, we propose a method named Temporally-Aggregating Multiple-Discontinuous-Image Saliency Prediction Network (TA-MSNet). It utilizes a transformer-based attention module to correlate relative saliency information among multiple discontinuous images and, furthermore, applies the ConvLSTM module to capture the temporal information. Experiments show that the proposed TA-MSNet can estimate better and more consistent results than previous works for time series data.","PeriodicalId":341244,"journal":{"name":"2022 International Conference on Robotics and Automation (ICRA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Temporally-Aggregating Multiple-Discontinuous-Image Saliency Prediction with Transformer-Based Attention\",\"authors\":\"Pin-Jie Huang, Chi-An Lu, Kuan-Wen Chen\",\"doi\":\"10.1109/icra46639.2022.9811544\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we aim to apply deep saliency prediction to automatic drone exploration, which should consider not only one single image, but multiple images from different view angles or localizations in order to determine the exploration direction. However, little attention has been paid to such saliency prediction problem over multiple-discontinuous-image and none of existing methods take temporal information into consideration, which may mean that the current predicted saliency map is not consistent with the previous predicted results. For this purpose, we propose a method named Temporally-Aggregating Multiple-Discontinuous-Image Saliency Prediction Network (TA-MSNet). It utilizes a transformer-based attention module to correlate relative saliency information among multiple discontinuous images and, furthermore, applies the ConvLSTM module to capture the temporal information. Experiments show that the proposed TA-MSNet can estimate better and more consistent results than previous works for time series data.\",\"PeriodicalId\":341244,\"journal\":{\"name\":\"2022 International Conference on Robotics and Automation (ICRA)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Robotics and Automation (ICRA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/icra46639.2022.9811544\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Robotics and Automation (ICRA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icra46639.2022.9811544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

本文的目标是将深度显著性预测应用于无人机自动探测，该方法不仅要考虑单幅图像，而且要考虑不同视角或定位的多幅图像，以确定探测方向。然而，对于多幅不连续图像的显著性预测问题关注较少，现有的方法都没有考虑到时间信息，这可能意味着当前预测的显著性图与之前的预测结果不一致。为此，我们提出了一种时间聚合多不连续图像显著性预测网络(TA-MSNet)方法。它利用基于变换的注意力模块来关联多个不连续图像之间的相对显著性信息，并进一步应用ConvLSTM模块来捕获时间信息。实验表明，该方法对时间序列数据的估计结果比以往的方法更好、更一致。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Temporally-Aggregating Multiple-Discontinuous-Image Saliency Prediction with Transformer-Based Attention

In this paper, we aim to apply deep saliency prediction to automatic drone exploration, which should consider not only one single image, but multiple images from different view angles or localizations in order to determine the exploration direction. However, little attention has been paid to such saliency prediction problem over multiple-discontinuous-image and none of existing methods take temporal information into consideration, which may mean that the current predicted saliency map is not consistent with the previous predicted results. For this purpose, we propose a method named Temporally-Aggregating Multiple-Discontinuous-Image Saliency Prediction Network (TA-MSNet). It utilizes a transformer-based attention module to correlate relative saliency information among multiple discontinuous images and, furthermore, applies the ConvLSTM module to capture the temporal information. Experiments show that the proposed TA-MSNet can estimate better and more consistent results than previous works for time series data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Robotics and Automation (ICRA)

自引率

0.00%

发文量