Deriving high-level scene descriptions from deep scene CNN features

Akram Bayat, M. Pomplun
{"title":"Deriving high-level scene descriptions from deep scene CNN features","authors":"Akram Bayat, M. Pomplun","doi":"10.1109/IPTA.2017.8310111","DOIUrl":null,"url":null,"abstract":"In this paper, we generate two computational models in order to estimate two dominant global properties (naturalness and openness) for representing a scene based on its global spatial structure. Naturalness and openness are two dominant perceptual properties within a multidimensional space in which semantically similar scenes (e.g., corridor and hallway) are assigned to nearby points. In this model space, the representation of a real-world scene is based on the overall shape of a scene but not on local object information. We introduce the use of a deep convolutional neural network for generating features that are well-suited for estimating the two global properties of a visual scene. The extracted features are integrated in an efficient way and fed into a linear support vector machine (SVM) to classify naturalness versus man-madeness and openness versus closedness. These two global properties (naturalness and openness) of an input image can be predicted from activations in the lowest layer of the convolutional neural network which has been trained for a scene recognition task. The consistent results of computational models in full and restricted spatial frequency ranges suggest that the representation of an image in the lowest layer of the deep scene CNN contains holistic information of the images as it leads to highest accuracy in modelling the global shape of the scene.","PeriodicalId":316356,"journal":{"name":"2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPTA.2017.8310111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

In this paper, we generate two computational models in order to estimate two dominant global properties (naturalness and openness) for representing a scene based on its global spatial structure. Naturalness and openness are two dominant perceptual properties within a multidimensional space in which semantically similar scenes (e.g., corridor and hallway) are assigned to nearby points. In this model space, the representation of a real-world scene is based on the overall shape of a scene but not on local object information. We introduce the use of a deep convolutional neural network for generating features that are well-suited for estimating the two global properties of a visual scene. The extracted features are integrated in an efficient way and fed into a linear support vector machine (SVM) to classify naturalness versus man-madeness and openness versus closedness. These two global properties (naturalness and openness) of an input image can be predicted from activations in the lowest layer of the convolutional neural network which has been trained for a scene recognition task. The consistent results of computational models in full and restricted spatial frequency ranges suggest that the representation of an image in the lowest layer of the deep scene CNN contains holistic information of the images as it leads to highest accuracy in modelling the global shape of the scene.
从深度场景CNN特征中提取高级场景描述
在本文中,我们生成了两个计算模型,以估计基于全局空间结构表示场景的两个主要全局属性(自然性和开放性)。在多维空间中,语义相似的场景(如走廊和走廊)被分配到附近的点,自然性和开放性是两个主要的感知属性。在这个模型空间中,真实场景的表示是基于场景的整体形状,而不是局部对象信息。我们介绍了使用深度卷积神经网络来生成非常适合于估计视觉场景的两个全局属性的特征。将提取的特征以有效的方式整合并馈送到线性支持向量机(SVM)中,对自然与人为、开放与封闭进行分类。输入图像的这两个全局属性(自然性和开放性)可以通过卷积神经网络最低层的激活来预测,卷积神经网络已经为场景识别任务进行了训练。计算模型在全空间频率范围和有限空间频率范围内的一致结果表明,深度场景CNN的最低层图像表示包含图像的整体信息,因为它在模拟场景的全局形状方面具有最高的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信