通过图像中高级对象上下文分析实现语义场景理解

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI:10.1016/j.cviu.2025.104299

Luis Hernando Ríos González , Sebastián López Flórez , Alfonso González-Briones , Fernando de la Prieta

{"title":"通过图像中高级对象上下文分析实现语义场景理解","authors":"Luis Hernando Ríos González , Sebastián López Flórez , Alfonso González-Briones , Fernando de la Prieta","doi":"10.1016/j.cviu.2025.104299","DOIUrl":null,"url":null,"abstract":"<div><div>Advancements in computer vision have primarily concentrated on interpreting visual data, often overlooking the significance of contextual differences across various regions within images. In contrast, our research introduces a model for indoor scene recognition that pivots towards the ‘attention’ paradigm. This model views attention as a response to the stimulus image properties, suggesting that focus is ‘pulled’ towards the most visually salient zones within an image, as represented in a saliency map. Attention is directed towards these zones based on uninterpreted semantic features of the image, such as luminance contrast, color, shape, and edge orientation. This neurobiologically plausible and computationally tractable approach offers a more nuanced understanding of scenes by prioritizing zones solely based on their image properties. The proposed model enhances scene understanding through an in-depth analysis of the object context in images. Scene recognition is achieved by extracting features from selected regions of interest within individual image frames using patch-based object detection techniques, thus generating distinctive feature descriptors for the identified objects of interest. The resulting feature descriptors are then subjected to semantic embedding, which uses distributed representations to transform the sparse feature vectors into dense semantic vectors within a learned latent space. This enables subsequent classification tasks by machine learning models trained on embedded semantic representations. This model was evaluated on three image datasets: UIUC Sports-8, PASCAL VOC - Visual Object Classes, and a proprietary image set created by the authors. Compared to state-of-the-art methods, this paper presents a more robust approach to the abstraction and generalization of interior scenes. This approach has demonstrated superior accuracy with our novel model over existing models. Consequently, this has led to an improvement in the classification of scenes in the selected indoor environments. Our code is published here: <span><span>https://github.com/sebastianlop8/Semantic-Scene-Object-Context-Analysis.git</span><svg><path></path></svg></span></div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"252 ","pages":"Article 104299"},"PeriodicalIF":4.3000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semantic scene understanding through advanced object context analysis in image\",\"authors\":\"Luis Hernando Ríos González , Sebastián López Flórez , Alfonso González-Briones , Fernando de la Prieta\",\"doi\":\"10.1016/j.cviu.2025.104299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Advancements in computer vision have primarily concentrated on interpreting visual data, often overlooking the significance of contextual differences across various regions within images. In contrast, our research introduces a model for indoor scene recognition that pivots towards the ‘attention’ paradigm. This model views attention as a response to the stimulus image properties, suggesting that focus is ‘pulled’ towards the most visually salient zones within an image, as represented in a saliency map. Attention is directed towards these zones based on uninterpreted semantic features of the image, such as luminance contrast, color, shape, and edge orientation. This neurobiologically plausible and computationally tractable approach offers a more nuanced understanding of scenes by prioritizing zones solely based on their image properties. The proposed model enhances scene understanding through an in-depth analysis of the object context in images. Scene recognition is achieved by extracting features from selected regions of interest within individual image frames using patch-based object detection techniques, thus generating distinctive feature descriptors for the identified objects of interest. The resulting feature descriptors are then subjected to semantic embedding, which uses distributed representations to transform the sparse feature vectors into dense semantic vectors within a learned latent space. This enables subsequent classification tasks by machine learning models trained on embedded semantic representations. This model was evaluated on three image datasets: UIUC Sports-8, PASCAL VOC - Visual Object Classes, and a proprietary image set created by the authors. Compared to state-of-the-art methods, this paper presents a more robust approach to the abstraction and generalization of interior scenes. This approach has demonstrated superior accuracy with our novel model over existing models. Consequently, this has led to an improvement in the classification of scenes in the selected indoor environments. Our code is published here: <span><span>https://github.com/sebastianlop8/Semantic-Scene-Object-Context-Analysis.git</span><svg><path></path></svg></span></div></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"252 \",\"pages\":\"Article 104299\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314225000220\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225000220","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

计算机视觉的进步主要集中在解释视觉数据上，往往忽略了图像中不同区域之间的上下文差异的重要性。相比之下，我们的研究引入了一个以“注意力”范式为中心的室内场景识别模型。该模型将注意力视为对刺激图像属性的反应，表明焦点被“拉”向图像中视觉上最显著的区域，如显著性图所示。注意力被引导到这些区域是基于图像的未解释语义特征，如亮度对比、颜色、形状和边缘方向。这种神经生物学上合理且计算上易于处理的方法，通过仅根据图像属性对区域进行优先排序，提供了对场景更细致的理解。该模型通过对图像中物体上下文的深入分析来增强对场景的理解。场景识别是通过使用基于补丁的对象检测技术从单个图像帧中选择感兴趣的区域提取特征来实现的，从而为已识别的感兴趣的对象生成独特的特征描述符。然后对得到的特征描述符进行语义嵌入，语义嵌入使用分布式表示将稀疏特征向量转换为学习潜在空间内的密集语义向量。这使得后续的分类任务可以通过在嵌入式语义表示上训练的机器学习模型实现。该模型在三个图像数据集上进行了评估：UIUC Sports-8、PASCAL VOC - Visual Object Classes和作者创建的专有图像集。与现有的方法相比，本文提出了一种更稳健的方法来抽象和概括室内场景。这种方法证明了我们的新模型优于现有模型的准确性。因此，这导致了在选定的室内环境的场景分类的改进。我们的代码发布在这里：https://github.com/sebastianlop8/Semantic-Scene-Object-Context-Analysis.git

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Semantic scene understanding through advanced object context analysis in image

Advancements in computer vision have primarily concentrated on interpreting visual data, often overlooking the significance of contextual differences across various regions within images. In contrast, our research introduces a model for indoor scene recognition that pivots towards the ‘attention’ paradigm. This model views attention as a response to the stimulus image properties, suggesting that focus is ‘pulled’ towards the most visually salient zones within an image, as represented in a saliency map. Attention is directed towards these zones based on uninterpreted semantic features of the image, such as luminance contrast, color, shape, and edge orientation. This neurobiologically plausible and computationally tractable approach offers a more nuanced understanding of scenes by prioritizing zones solely based on their image properties. The proposed model enhances scene understanding through an in-depth analysis of the object context in images. Scene recognition is achieved by extracting features from selected regions of interest within individual image frames using patch-based object detection techniques, thus generating distinctive feature descriptors for the identified objects of interest. The resulting feature descriptors are then subjected to semantic embedding, which uses distributed representations to transform the sparse feature vectors into dense semantic vectors within a learned latent space. This enables subsequent classification tasks by machine learning models trained on embedded semantic representations. This model was evaluated on three image datasets: UIUC Sports-8, PASCAL VOC - Visual Object Classes, and a proprietary image set created by the authors. Compared to state-of-the-art methods, this paper presents a more robust approach to the abstraction and generalization of interior scenes. This approach has demonstrated superior accuracy with our novel model over existing models. Consequently, this has led to an improvement in the classification of scenes in the selected indoor environments. Our code is published here: https://github.com/sebastianlop8/Semantic-Scene-Object-Context-Analysis.git

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems