Learning to Attend to Salient Targets in Driving Videos Using Fully Convolutional RNN

2018 21st International Conference on Intelligent Transportation Systems (ITSC) Pub Date : 2018-11-01 DOI:10.1109/ITSC.2018.8569438

Ashish Tawari, P. Mallela, Sujitha Martin

引用次数: 18

Abstract

Driving involves the processing of rich audio, visual and haptic signals to make safe and calculated decisions on the road. Human vision plays a crucial role in this task and analysis of the gaze behavior could provide some insights into the action the driver takes upon seeing an object/region. A typical representation of the gaze behavior is a saliency map. The work in this paper aims to predict this saliency map given a sequence of image frames. Strategies are developed to address important topics for video saliency including active gaze (i.e. gaze that is useful for driving), pixel- and object-level information, and suppression of non-negative pixels in the saliency maps. These strategies enabled the development of a novel pixel- and object-level saliency ground truth dataset using real-world driving data around traffic intersections. We further proposed a fully convolutional RNN architecture capable of handling time sequence image data to estimate saliency map. Our methodology shows promising results.

查看原文本刊更多论文

学习使用全卷积RNN在驾驶视频中关注突出目标

驾驶涉及到对丰富的音频、视觉和触觉信号的处理，从而在道路上做出安全且经过计算的决策。人类视觉在这项任务中起着至关重要的作用，对注视行为的分析可以为驾驶员看到物体/区域时采取的行动提供一些见解。凝视行为的典型表征是显著性图。本文的工作旨在预测给定图像帧序列的显著性映射。开发了一些策略来解决视频显著性的重要主题，包括主动凝视(即对驾驶有用的凝视)，像素和物体级信息，以及显著性地图中非负像素的抑制。这些策略能够利用十字路口周围的真实驾驶数据开发出新的像素级和对象级显著性地面真实数据集。我们进一步提出了一种能够处理时间序列图像数据以估计显著性映射的全卷积RNN架构。我们的方法显示出有希望的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 21st International Conference on Intelligent Transportation Systems (ITSC)

自引率

0.00%

发文量