图像对象间二维空间方向关系识别

IF 5.1 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Gong Peiyong , Zheng Kai , Jiang Yi , Zhao Huixuan , Huai Honghao , Guan Ruijie
{"title":"图像对象间二维空间方向关系识别","authors":"Gong Peiyong ,&nbsp;Zheng Kai ,&nbsp;Jiang Yi ,&nbsp;Zhao Huixuan ,&nbsp;Huai Honghao ,&nbsp;Guan Ruijie","doi":"10.1016/j.jestch.2025.102074","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advances in computer vision have concentrated on comprehension of the semantic features of images, particularly the spatial relations between objects—a fundamental semantic feature of visual scene understanding. This study systematically addresses the recognition problem of two-dimensional spatial orientation relations and develops the Target Spatial Orientation Vector Field (TSOVF) algorithm, a novel end-to-end framework to explicitly model spatial orientation dependencies. TSOVF algorithm introduces the learnable spatial orientation vector field to effectively encode the spatial orientation relation into a deep convolutional neural network model. The proposed architecture features a dual-branch design: the T-branch identifies object central points and classifies categories via keypoint estimation, while the S-branch constructs a pixel-level spatial orientation vector field. Each vector in this field quantifies the angular orientation between object pairs, with aggregated vector data determining the final spatial relation category. A dedicated fusion module synthesizes features from both branches, generating a structured triple list that documents detected objects, their inter-object spatial orientations, and associated confidence scores. Evaluated on a PASCAL VOC2012-derived dataset, TSOVF algorithm achieves 94.8 % global accuracy and a class-balanced geometric mean (G-mean) of 0.798, demonstrating robust performance across various spatial configurations. For dominant orientation categories, the algorithm attains up to 95.9 % precision and 94.7 % F1-score, establishing it as a foundational benchmark for spatial relation recognition. These results validate TSOVF’s capacity to advance fine-grained visual relationship detection while providing a reproducible framework for future research in spatial-semantic analysis.</div></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"67 ","pages":"Article 102074"},"PeriodicalIF":5.1000,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Two-dimensional spatial orientation relation recognition between image objects\",\"authors\":\"Gong Peiyong ,&nbsp;Zheng Kai ,&nbsp;Jiang Yi ,&nbsp;Zhao Huixuan ,&nbsp;Huai Honghao ,&nbsp;Guan Ruijie\",\"doi\":\"10.1016/j.jestch.2025.102074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recent advances in computer vision have concentrated on comprehension of the semantic features of images, particularly the spatial relations between objects—a fundamental semantic feature of visual scene understanding. This study systematically addresses the recognition problem of two-dimensional spatial orientation relations and develops the Target Spatial Orientation Vector Field (TSOVF) algorithm, a novel end-to-end framework to explicitly model spatial orientation dependencies. TSOVF algorithm introduces the learnable spatial orientation vector field to effectively encode the spatial orientation relation into a deep convolutional neural network model. The proposed architecture features a dual-branch design: the T-branch identifies object central points and classifies categories via keypoint estimation, while the S-branch constructs a pixel-level spatial orientation vector field. Each vector in this field quantifies the angular orientation between object pairs, with aggregated vector data determining the final spatial relation category. A dedicated fusion module synthesizes features from both branches, generating a structured triple list that documents detected objects, their inter-object spatial orientations, and associated confidence scores. Evaluated on a PASCAL VOC2012-derived dataset, TSOVF algorithm achieves 94.8 % global accuracy and a class-balanced geometric mean (G-mean) of 0.798, demonstrating robust performance across various spatial configurations. For dominant orientation categories, the algorithm attains up to 95.9 % precision and 94.7 % F1-score, establishing it as a foundational benchmark for spatial relation recognition. These results validate TSOVF’s capacity to advance fine-grained visual relationship detection while providing a reproducible framework for future research in spatial-semantic analysis.</div></div>\",\"PeriodicalId\":48609,\"journal\":{\"name\":\"Engineering Science and Technology-An International Journal-Jestech\",\"volume\":\"67 \",\"pages\":\"Article 102074\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2025-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Science and Technology-An International Journal-Jestech\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2215098625001296\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215098625001296","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

计算机视觉的最新进展集中在对图像的语义特征的理解上,特别是对象之间的空间关系,这是视觉场景理解的基本语义特征。本研究系统地解决了二维空间方向关系的识别问题,并开发了目标空间方向向量场(TSOVF)算法,这是一种新的端到端框架,用于显式建模空间方向依赖关系。TSOVF算法引入了可学习的空间方向向量场,有效地将空间方向关系编码为深度卷积神经网络模型。该架构采用双分支设计:t分支识别目标中心点并通过关键点估计进行类别分类,而s分支构建像素级空间方向向量场。该领域中的每个向量量化了对象对之间的角方向,聚合的向量数据决定了最终的空间关系类别。一个专用的融合模块综合了两个分支的特征,生成了一个结构化的三重列表,该列表记录了检测到的对象、它们的对象间空间方向和相关的置信度得分。在PASCAL voc2012衍生数据集上,TSOVF算法的全局精度达到94.8%,类平衡几何均值(G-mean)为0.798,在不同的空间配置中表现出稳健的性能。在优势方向分类中,准确率达到95.9%,f1得分达到94.7%,为空间关系识别奠定了基础。这些结果验证了TSOVF在推进细粒度视觉关系检测方面的能力,同时为未来的空间语义分析研究提供了一个可复制的框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Two-dimensional spatial orientation relation recognition between image objects
Recent advances in computer vision have concentrated on comprehension of the semantic features of images, particularly the spatial relations between objects—a fundamental semantic feature of visual scene understanding. This study systematically addresses the recognition problem of two-dimensional spatial orientation relations and develops the Target Spatial Orientation Vector Field (TSOVF) algorithm, a novel end-to-end framework to explicitly model spatial orientation dependencies. TSOVF algorithm introduces the learnable spatial orientation vector field to effectively encode the spatial orientation relation into a deep convolutional neural network model. The proposed architecture features a dual-branch design: the T-branch identifies object central points and classifies categories via keypoint estimation, while the S-branch constructs a pixel-level spatial orientation vector field. Each vector in this field quantifies the angular orientation between object pairs, with aggregated vector data determining the final spatial relation category. A dedicated fusion module synthesizes features from both branches, generating a structured triple list that documents detected objects, their inter-object spatial orientations, and associated confidence scores. Evaluated on a PASCAL VOC2012-derived dataset, TSOVF algorithm achieves 94.8 % global accuracy and a class-balanced geometric mean (G-mean) of 0.798, demonstrating robust performance across various spatial configurations. For dominant orientation categories, the algorithm attains up to 95.9 % precision and 94.7 % F1-score, establishing it as a foundational benchmark for spatial relation recognition. These results validate TSOVF’s capacity to advance fine-grained visual relationship detection while providing a reproducible framework for future research in spatial-semantic analysis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Engineering Science and Technology-An International Journal-Jestech
Engineering Science and Technology-An International Journal-Jestech Materials Science-Electronic, Optical and Magnetic Materials
CiteScore
11.20
自引率
3.50%
发文量
153
审稿时长
22 days
期刊介绍: Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology. The scope of JESTECH includes a wide spectrum of subjects including: -Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing) -Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences) -Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信