{"title":"Saliency Map Extraction in Human Crowd RGB Data","authors":"Minh Tri Nguyen, Prarinya Siritanawan, K. Kotani","doi":"10.23919/SICE.2019.8859898","DOIUrl":null,"url":null,"abstract":"Saliency map in human crowded scene is a prediction of regions which attracts human visual attention. Humans have an ability to analyze the context of visual scene and focus their attention to salient regions in the crowd scene. In this work, we propose a novel convolutional neural network based method for saliency prediction. Unlike classical works on crowd scene using hand-crafted face features, our model extracts deep features using convolutional layers from image classification model and learns the global context using large receptive convolutional layers. Self-attention mechanism is applied to detect the dependency between elements of feature maps. This model overperformed state-of-the-art methods on the saliency in human crowd Eyecrowd dataset.","PeriodicalId":147772,"journal":{"name":"2019 58th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE)","volume":"166 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 58th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/SICE.2019.8859898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Saliency map in human crowded scene is a prediction of regions which attracts human visual attention. Humans have an ability to analyze the context of visual scene and focus their attention to salient regions in the crowd scene. In this work, we propose a novel convolutional neural network based method for saliency prediction. Unlike classical works on crowd scene using hand-crafted face features, our model extracts deep features using convolutional layers from image classification model and learns the global context using large receptive convolutional layers. Self-attention mechanism is applied to detect the dependency between elements of feature maps. This model overperformed state-of-the-art methods on the saliency in human crowd Eyecrowd dataset.