Gong Peiyong , Zheng Kai , Jiang Yi , Zhao Huixuan , Huai Honghao , Guan Ruijie
{"title":"Two-dimensional spatial orientation relation recognition between image objects","authors":"Gong Peiyong , Zheng Kai , Jiang Yi , Zhao Huixuan , Huai Honghao , Guan Ruijie","doi":"10.1016/j.jestch.2025.102074","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advances in computer vision have concentrated on comprehension of the semantic features of images, particularly the spatial relations between objects—a fundamental semantic feature of visual scene understanding. This study systematically addresses the recognition problem of two-dimensional spatial orientation relations and develops the Target Spatial Orientation Vector Field (TSOVF) algorithm, a novel end-to-end framework to explicitly model spatial orientation dependencies. TSOVF algorithm introduces the learnable spatial orientation vector field to effectively encode the spatial orientation relation into a deep convolutional neural network model. The proposed architecture features a dual-branch design: the T-branch identifies object central points and classifies categories via keypoint estimation, while the S-branch constructs a pixel-level spatial orientation vector field. Each vector in this field quantifies the angular orientation between object pairs, with aggregated vector data determining the final spatial relation category. A dedicated fusion module synthesizes features from both branches, generating a structured triple list that documents detected objects, their inter-object spatial orientations, and associated confidence scores. Evaluated on a PASCAL VOC2012-derived dataset, TSOVF algorithm achieves 94.8 % global accuracy and a class-balanced geometric mean (G-mean) of 0.798, demonstrating robust performance across various spatial configurations. For dominant orientation categories, the algorithm attains up to 95.9 % precision and 94.7 % F1-score, establishing it as a foundational benchmark for spatial relation recognition. These results validate TSOVF’s capacity to advance fine-grained visual relationship detection while providing a reproducible framework for future research in spatial-semantic analysis.</div></div>","PeriodicalId":48609,"journal":{"name":"Engineering Science and Technology-An International Journal-Jestech","volume":"67 ","pages":"Article 102074"},"PeriodicalIF":5.1000,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Science and Technology-An International Journal-Jestech","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2215098625001296","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advances in computer vision have concentrated on comprehension of the semantic features of images, particularly the spatial relations between objects—a fundamental semantic feature of visual scene understanding. This study systematically addresses the recognition problem of two-dimensional spatial orientation relations and develops the Target Spatial Orientation Vector Field (TSOVF) algorithm, a novel end-to-end framework to explicitly model spatial orientation dependencies. TSOVF algorithm introduces the learnable spatial orientation vector field to effectively encode the spatial orientation relation into a deep convolutional neural network model. The proposed architecture features a dual-branch design: the T-branch identifies object central points and classifies categories via keypoint estimation, while the S-branch constructs a pixel-level spatial orientation vector field. Each vector in this field quantifies the angular orientation between object pairs, with aggregated vector data determining the final spatial relation category. A dedicated fusion module synthesizes features from both branches, generating a structured triple list that documents detected objects, their inter-object spatial orientations, and associated confidence scores. Evaluated on a PASCAL VOC2012-derived dataset, TSOVF algorithm achieves 94.8 % global accuracy and a class-balanced geometric mean (G-mean) of 0.798, demonstrating robust performance across various spatial configurations. For dominant orientation categories, the algorithm attains up to 95.9 % precision and 94.7 % F1-score, establishing it as a foundational benchmark for spatial relation recognition. These results validate TSOVF’s capacity to advance fine-grained visual relationship detection while providing a reproducible framework for future research in spatial-semantic analysis.
期刊介绍:
Engineering Science and Technology, an International Journal (JESTECH) (formerly Technology), a peer-reviewed quarterly engineering journal, publishes both theoretical and experimental high quality papers of permanent interest, not previously published in journals, in the field of engineering and applied science which aims to promote the theory and practice of technology and engineering. In addition to peer-reviewed original research papers, the Editorial Board welcomes original research reports, state-of-the-art reviews and communications in the broadly defined field of engineering science and technology.
The scope of JESTECH includes a wide spectrum of subjects including:
-Electrical/Electronics and Computer Engineering (Biomedical Engineering and Instrumentation; Coding, Cryptography, and Information Protection; Communications, Networks, Mobile Computing and Distributed Systems; Compilers and Operating Systems; Computer Architecture, Parallel Processing, and Dependability; Computer Vision and Robotics; Control Theory; Electromagnetic Waves, Microwave Techniques and Antennas; Embedded Systems; Integrated Circuits, VLSI Design, Testing, and CAD; Microelectromechanical Systems; Microelectronics, and Electronic Devices and Circuits; Power, Energy and Energy Conversion Systems; Signal, Image, and Speech Processing)
-Mechanical and Civil Engineering (Automotive Technologies; Biomechanics; Construction Materials; Design and Manufacturing; Dynamics and Control; Energy Generation, Utilization, Conversion, and Storage; Fluid Mechanics and Hydraulics; Heat and Mass Transfer; Micro-Nano Sciences; Renewable and Sustainable Energy Technologies; Robotics and Mechatronics; Solid Mechanics and Structure; Thermal Sciences)
-Metallurgical and Materials Engineering (Advanced Materials Science; Biomaterials; Ceramic and Inorgnanic Materials; Electronic-Magnetic Materials; Energy and Environment; Materials Characterizastion; Metallurgy; Polymers and Nanocomposites)