Yujie Wu , Hengliang Tan , Jiao Du , Shuo Yang , Guofeng Yan
{"title":"基于联合度量学习的深度混合流形网络图像集分类","authors":"Yujie Wu , Hengliang Tan , Jiao Du , Shuo Yang , Guofeng Yan","doi":"10.1016/j.imavis.2025.105647","DOIUrl":null,"url":null,"abstract":"<div><div>Many studies have shown that complex visual data exhibit non-linear and non-Euclidean characteristics. How to find an intrinsic and low-dimensional representation for non-linear visual data is crucial for image set classification. Due to the powerful data interpretation of deep neural networks and the intrinsic structural exploitation of manifold learning, deep Riemannian neural networks have demonstrated excellent performance on solving the non-linear and non-Euclidean data. However, on the one hand, deep Riemannian neural networks usually focus on exploring the intrinsic structure of the single manifold, while complex visual data may contain multiple potential intrinsic structures. On the other hand, the single cross-entropy is usually adopted as the sole loss function, which may lose discriminative metric information. In this paper, we propose a deep Riemannian neural network by fusing Symmetric Positive Definite (SPD) and Grassmann manifolds to explore multiple intrinsic structures in complex visual data. We innovatively employ the Jensen–Bregman LogDet Divergence and Projection metric to construct two metric learning regularization terms over SPD and Grassmann manifold networks respectively, which capture the intra-class and inter-class data distributions. Subsequently, the regularization terms corresponding to different manifolds are jointly learned in conjunction with the cross-entropy loss function to fuse multiple loss information. Extensive experiments are conducted on expression recognition, gesture recognition, and action recognition tasks. Experimental results demonstrate the superior performance of the proposed Riemannian network.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"162 ","pages":"Article 105647"},"PeriodicalIF":4.2000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Hybrid Manifold Network with joint metric learning for image set classification\",\"authors\":\"Yujie Wu , Hengliang Tan , Jiao Du , Shuo Yang , Guofeng Yan\",\"doi\":\"10.1016/j.imavis.2025.105647\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Many studies have shown that complex visual data exhibit non-linear and non-Euclidean characteristics. How to find an intrinsic and low-dimensional representation for non-linear visual data is crucial for image set classification. Due to the powerful data interpretation of deep neural networks and the intrinsic structural exploitation of manifold learning, deep Riemannian neural networks have demonstrated excellent performance on solving the non-linear and non-Euclidean data. However, on the one hand, deep Riemannian neural networks usually focus on exploring the intrinsic structure of the single manifold, while complex visual data may contain multiple potential intrinsic structures. On the other hand, the single cross-entropy is usually adopted as the sole loss function, which may lose discriminative metric information. In this paper, we propose a deep Riemannian neural network by fusing Symmetric Positive Definite (SPD) and Grassmann manifolds to explore multiple intrinsic structures in complex visual data. We innovatively employ the Jensen–Bregman LogDet Divergence and Projection metric to construct two metric learning regularization terms over SPD and Grassmann manifold networks respectively, which capture the intra-class and inter-class data distributions. Subsequently, the regularization terms corresponding to different manifolds are jointly learned in conjunction with the cross-entropy loss function to fuse multiple loss information. Extensive experiments are conducted on expression recognition, gesture recognition, and action recognition tasks. Experimental results demonstrate the superior performance of the proposed Riemannian network.</div></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"162 \",\"pages\":\"Article 105647\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885625002355\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625002355","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Deep Hybrid Manifold Network with joint metric learning for image set classification
Many studies have shown that complex visual data exhibit non-linear and non-Euclidean characteristics. How to find an intrinsic and low-dimensional representation for non-linear visual data is crucial for image set classification. Due to the powerful data interpretation of deep neural networks and the intrinsic structural exploitation of manifold learning, deep Riemannian neural networks have demonstrated excellent performance on solving the non-linear and non-Euclidean data. However, on the one hand, deep Riemannian neural networks usually focus on exploring the intrinsic structure of the single manifold, while complex visual data may contain multiple potential intrinsic structures. On the other hand, the single cross-entropy is usually adopted as the sole loss function, which may lose discriminative metric information. In this paper, we propose a deep Riemannian neural network by fusing Symmetric Positive Definite (SPD) and Grassmann manifolds to explore multiple intrinsic structures in complex visual data. We innovatively employ the Jensen–Bregman LogDet Divergence and Projection metric to construct two metric learning regularization terms over SPD and Grassmann manifold networks respectively, which capture the intra-class and inter-class data distributions. Subsequently, the regularization terms corresponding to different manifolds are jointly learned in conjunction with the cross-entropy loss function to fuse multiple loss information. Extensive experiments are conducted on expression recognition, gesture recognition, and action recognition tasks. Experimental results demonstrate the superior performance of the proposed Riemannian network.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.