Daniel Simon, S. Sridharan, Shagan Sah, R. Ptucha, Christopher Kanan, Reynold J. Bailey
{"title":"Automatic scanpath generation with deep recurrent neural networks","authors":"Daniel Simon, S. Sridharan, Shagan Sah, R. Ptucha, Christopher Kanan, Reynold J. Bailey","doi":"10.1145/2931002.2948726","DOIUrl":null,"url":null,"abstract":"Many computer vision algorithms are biologically inspired and designed based on the human visual system. Convolutional neural networks (CNNs) are similarly inspired by the primary visual cortex in the human brain. However, the key difference between current visual models and the human visual system is how the visual information is gathered and processed. We make eye movements to collect information from the environment for navigation and task performance. We also make specific eye movements to important regions in the stimulus to perform the task-at-hand quickly and efficiently. Researchers have used expert scanpaths to train novices for improving the accuracy of visual search tasks. One of the limitations of such a system is that we need an expert to examine each visual stimuli beforehand to generate the scanpaths. In order to extend the idea of gaze guidance to a new unseen stimulus, there is a need for a computational model that can automatically generate expert-like scanpaths. We propose a model for automatic scanpath generation using a convolutional neural network (CNN) and long short-term memory (LSTM) modules. Our model uses LSTMs due to the temporal nature of eye movement data (scanpaths) where the system makes fixation predictions based on previous locations examined.","PeriodicalId":102213,"journal":{"name":"Proceedings of the ACM Symposium on Applied Perception","volume":"177 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Symposium on Applied Perception","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2931002.2948726","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Many computer vision algorithms are biologically inspired and designed based on the human visual system. Convolutional neural networks (CNNs) are similarly inspired by the primary visual cortex in the human brain. However, the key difference between current visual models and the human visual system is how the visual information is gathered and processed. We make eye movements to collect information from the environment for navigation and task performance. We also make specific eye movements to important regions in the stimulus to perform the task-at-hand quickly and efficiently. Researchers have used expert scanpaths to train novices for improving the accuracy of visual search tasks. One of the limitations of such a system is that we need an expert to examine each visual stimuli beforehand to generate the scanpaths. In order to extend the idea of gaze guidance to a new unseen stimulus, there is a need for a computational model that can automatically generate expert-like scanpaths. We propose a model for automatic scanpath generation using a convolutional neural network (CNN) and long short-term memory (LSTM) modules. Our model uses LSTMs due to the temporal nature of eye movement data (scanpaths) where the system makes fixation predictions based on previous locations examined.