Neurocomputing最新文献_第4页

Coordinate descent for top-k multi-label feature selection with pseudo-label learning and manifold learning 基于伪标签学习和流形学习的top-k多标签特征选择的坐标下降

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-30 DOI: 10.1016/j.neucom.2025.131640

Ruijia Li , Yingcang Ma , Hong Chen , Xiaofei Yang , Zhiwei Xing

{"title":"Coordinate descent for top-k multi-label feature selection with pseudo-label learning and manifold learning","authors":"Ruijia Li , Yingcang Ma , Hong Chen , Xiaofei Yang , Zhiwei Xing","doi":"10.1016/j.neucom.2025.131640","DOIUrl":"10.1016/j.neucom.2025.131640","url":null,"abstract":"<div><div>Multi-label learning plays an increasingly important role in handling complex problems where data instances are associated with multiple labels. However, current methods face significant limitations when dealing with high-dimensional feature spaces. They struggle to preserve the geometric structure among features while failing to fully exploit the latent correlations between labels. To address these key challenges, this paper proposes a novel feature selection method called coordinate descent for top-k multi-label feature selection with pseudo-label learning and manifold learning (CD-MPL), which integrates manifold learning with pseudo-label learning techniques. First, by constructing a feature graph Laplacian matrix, we establish a mathematical representation of the feature manifold structure, effectively preserving the local geometric properties of the feature space. Second, we introduce a pseudo-label learning mechanism, converting discrete binary labels into continuous representations to better model complex label correlations. Notably, to tackle the non-convex optimization problem caused by the <span><math><msub><mi>ℓ</mi><mrow><mn>2</mn><mo>,</mo><mn>0</mn></mrow></msub></math></span>-norm constraint, we innovatively transform the original problem into the joint optimization of a continuous matrix and a discrete selection matrix. We then employ a coordinate descent (CD) method to efficiently solve the selection matrix, overcoming the non-convexity issue while enhancing model performance, interpretability, and practicality. Experimental results on ten multi-label datasets demonstrate that CD-MPL significantly outperforms existing methods across multiple key evaluation metrics, achieving an average performance improvement of 3.31 %. The algorithm maintains stable performance even with reduced feature subsets and exhibits rapid convergence within 10 iterations, fully validating its efficiency and effectiveness in multi-label classification tasks.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131640"},"PeriodicalIF":6.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Disentangled adaptive multi-dimensional dynamic graph convolutional network for skeleton-based action recognition 基于骨架的动作识别的解纠缠自适应多维动态图卷积网络

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-29 DOI: 10.1016/j.neucom.2025.131693

Jie Li , Peitao Ye , Yu Xia , Yanwen Wang , Yi Cao

{"title":"Disentangled adaptive multi-dimensional dynamic graph convolutional network for skeleton-based action recognition","authors":"Jie Li , Peitao Ye , Yu Xia , Yanwen Wang , Yi Cao","doi":"10.1016/j.neucom.2025.131693","DOIUrl":"10.1016/j.neucom.2025.131693","url":null,"abstract":"<div><div>Skeleton-based action recognition plays a key role in computer vision and has gained significant attention due to its broad range of applications. However, most existing methods using graph convolutional networks struggle to effectively learn rich temporal and spatial motion features of body joints. In this work, the disentangled adaptive multi-dimensional dynamic graph convolutional network model that we present consists of three modules: a disentangled adaptive graph convolutional network module, a multi-dimensional dynamic temporal convolutional network module, and an efficient multi-scale attention module. Firstly, the disentangled adaptive graph convolutional network module is able to learn crucial details and interactive relationships of body joints by updating the primitive anatomical structure of the human body and adaptively changing the structural graph topology. Then, the multi-dimensional dynamic temporal convolutional network module is proposed to improve the capability of rich trajectory feature extraction and comprehensive representation. Finally, the efficient multi-scale attention module can concentrate on spatial-temporal information across the temporal and spatial dimensions to strengthen features in critical temporal frames at significant joints. Extensive experiments are performed on three large-scale datasets, including NTU RGB+D, NTU RGB+D 120, and Kinetics-Skeleton, demonstrating that the proposed model achieves state-of-the-art performance and can extract rich trajectory and spatial information from skeleton data.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131693"},"PeriodicalIF":6.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TensorProjection layer: A tensor-based dimension reduction method in deep neural networks TensorProjection layer：深度神经网络中基于张量的降维方法

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-29 DOI: 10.1016/j.neucom.2025.131695

Toshinari Morimoto , Su-Yun Huang

{"title":"TensorProjection layer: A tensor-based dimension reduction method in deep neural networks","authors":"Toshinari Morimoto , Su-Yun Huang","doi":"10.1016/j.neucom.2025.131695","DOIUrl":"10.1016/j.neucom.2025.131695","url":null,"abstract":"<div><div>In this study, we propose a dimension reduction method for features with tensor structure, implemented as a neural network layer called the TensorProjection Layer. This layer applies mode-wise linear projections to the input tensor to reduce its dimensionality, with the projection directions treated as trainable parameters optimized during model training.</div><div>The method is particularly useful for image data, serving as an alternative to pooling layers that reduce spatial redundancy. It can also reduce channel dimensions, making it applicable to various forms of tensor compression. While especially effective for image-based tasks, its application is not limited to them—as long as the intermediate representation is a tensor. We also demonstrate its use in multi-channel time-series and language data, showcasing its flexibility across diverse modalities.</div><div>We evaluate the method by replacing specific layers in standard baseline models with TPL, across tasks including medical image classification and segmentation, classification of medical time-series signals, and classification of medical abstract texts. Experimental results suggest that, compared to conventional downsampling techniques such as pooling, the proposed layer offers improved generalization performance, making it a promising alternative for feature summarization in diverse neural network architectures.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131695"},"PeriodicalIF":6.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EIAformer: Empowering transformer with enhanced information acquisition for time series forecasting EIAformer：增强变压器的信息获取能力，用于时间序列预测

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-29 DOI: 10.1016/j.neucom.2025.131700

Weina Wang , Yongjie Wang , Xiaolong Qi , Hui Chen

{"title":"EIAformer: Empowering transformer with enhanced information acquisition for time series forecasting","authors":"Weina Wang , Yongjie Wang , Xiaolong Qi , Hui Chen","doi":"10.1016/j.neucom.2025.131700","DOIUrl":"10.1016/j.neucom.2025.131700","url":null,"abstract":"<div><div>Transformer-based models have gained significant popularity and demonstrated remarkable performance in long-term time series forecasting. However, existing Transformer-based models are not designed to fully exploit the variation patterns and multiscale information of time series data. Moreover, there is a lack of channel strategy that effectively captures the essential connections between channels for improving the efficiency and accuracy of channel utilization. To overcome these problems, we propose a novel and adaptable architecture, EIAformer, to utilize comprehensive information to enhance the prediction performance. Firstly, hybrid decomposition is proposed to perform different operations on data with different variation patterns using a divide-and-conquer strategy. Then, dynamic patching based on dilated causal convolution is designed to capture multiscale information. Finally, channel fusion based on Granger Causality and DTW distance is constructed to capture the correlation between different channels, and the merged channels are fed into the encoder to perform prediction. Extensive experiments on nine datasets demonstrate that EIAformer achieves superior performance compared to existing Transformer-based models. Meanwhile, the proposed enhancement module as a plug-and-play solution can boost the performance and efficiency of the Transformer family models.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131700"},"PeriodicalIF":6.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hybrid mask generation for infrared small target detection with single-point supervision 单点监督下红外小目标检测的混合掩模生成

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-29 DOI: 10.1016/j.neucom.2025.131688

Weijie He, Mushui Liu, Yunlong Yu

{"title":"Hybrid mask generation for infrared small target detection with single-point supervision","authors":"Weijie He, Mushui Liu, Yunlong Yu","doi":"10.1016/j.neucom.2025.131688","DOIUrl":"10.1016/j.neucom.2025.131688","url":null,"abstract":"<div><div>Single-frame infrared small target (SIRST) detection poses a significant challenge due to the requirement to discern minute targets amidst the complex infrared background clutter. In this paper, we focus on a weakly-supervised paradigm to obtain high-quality pseudo masks from the point-level annotation by integrating a novel learning-free method with the hybrid of the learning-based method. The learning-free method adheres to a sequential process, progressing from a point annotation to the bounding box that encompasses the target, and subsequently to detailed pseudo masks, while the hybrid is achieved through filtering out false alarms and retrieving missed detections in the network’s prediction to provide a reliable supplement for learning-free masks. The experimental results show that our learning-free method generates pseudo masks with an average Intersection over Union (IoU) that is 4.3 % higher than the second-best learning-free competitor across three datasets, while the hybrid learning-based method further enhances the quality of pseudo masks, achieving an additional average IoU increase of 3.4 %.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131688"},"PeriodicalIF":6.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145271502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MSF-GODE: Multi-scale frequency-domain learning in graph neural ODEs for accurate traffic flow forecasting 用于精确交通流量预测的多尺度频域学习图神经ode

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-29 DOI: 10.1016/j.neucom.2025.131566

Peng Liu , Yaodong Zhu , Yang Yang , Jilong Tang , Xiaojiao Jiang , Jinquan Wang

{"title":"MSF-GODE: Multi-scale frequency-domain learning in graph neural ODEs for accurate traffic flow forecasting","authors":"Peng Liu , Yaodong Zhu , Yang Yang , Jilong Tang , Xiaojiao Jiang , Jinquan Wang","doi":"10.1016/j.neucom.2025.131566","DOIUrl":"10.1016/j.neucom.2025.131566","url":null,"abstract":"<div><div>High-quality traffic forecasting plays a critical role in intelligent transportation systems (ITS) and the development of smart cities. However, the pervasive spatiotemporal heterogeneity in traffic data poses significant challenges for existing models in reliably capturing complex and evolving traffic dynamics. In addition, the frequent neglect of features from non-hotspot regions and the absence of effective cross-channel feature fusion mechanisms further hinder both predictive accuracy and generalization capabilities. To address these challenges, we propose a novel framework named Multi-Scale Spatiotemporal Frequency-aware Graph Ordinary Differential Equation network (MSF-GODE), which offers a unified and systematic modeling strategy to tackle the above limitations. Specifically, the model first utilizes a multi-scale frequency sample generator that leverages time–frequency decomposition to extract periodic structures and capture temporal dependencies across multiple resolutions. It then incorporates a spatiotemporal feature extractor that combines key feature selection and contrastive learning, thereby enhancing the model’s ability to represent non-key regions. Finally, a spatiotemporal frequency-domain feature fusion module is employed to model structural evolution and integrate multi-channel features more effectively. Extensive experiments conducted on six real-world traffic datasets demonstrate that MSF-GODE significantly outperforms existing state-of-the-art methods in terms of both prediction accuracy and generalization, offering a robust and effective solution for traffic forecasting in heterogeneous environments.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131566"},"PeriodicalIF":6.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Wavelet-integrated deep neural networks: A systematic review of applications and synergistic architectures 小波集成深度神经网络：应用和协同架构的系统回顾

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-27 DOI: 10.1016/j.neucom.2025.131648

Jiangtao Wu, Jiaqi Li, Jie Yang, Shuli Mei

{"title":"Wavelet-integrated deep neural networks: A systematic review of applications and synergistic architectures","authors":"Jiangtao Wu, Jiaqi Li, Jie Yang, Shuli Mei","doi":"10.1016/j.neucom.2025.131648","DOIUrl":"10.1016/j.neucom.2025.131648","url":null,"abstract":"<div><div>Wavelet transforms, known for their exceptional capabilities in multi-resolution analysis, have garnered significant attention in the integration with deep neural networks to address key challenges in complex pattern analysis and recognition tasks. This review examines how the integration of wavelet transforms with emerging deep learning techniques has accelerated progress across various domains, such as image and video processing, graph and spatial-temporal data analysis. By integrating wavelets into traditional deep learning models, such as convolutional neural networks (CNNs), and emerging architectures like transformers and diffusion models, we show how these hybrid methods improve multi-scale feature representation, efficiency, and interpretability, while mitigating common deep learning limitations such as high computational costs and reduced robustness in multi-resolution analysis. We systematically address the synergy between wavelet transforms and deep learning, a topic underexplored in previous literature, and highlight the diverse strategies of wavelet integration—ranging from foundational methods to advanced neural network architectures—and conduct a comparative analysis of their performance in real-world applications. We also identify critical gaps and present directions for future research, particularly in the areas of adaptive, data-driven wavelet frameworks and their potential in generative modeling and domain adaptation.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"657 ","pages":"Article 131648"},"PeriodicalIF":6.5,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CAMGnet: A self-learning classification model for univariate continuous time series signals via dynamic fusion of multi-dimensional cross-domain features CAMGnet：一种基于多维跨域特征动态融合的单变量连续时间序列信号自学习分类模型

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-27 DOI: 10.1016/j.neucom.2025.131686

Rui Zhang, Zheqi Rong, Zehua Dong

{"title":"CAMGnet: A self-learning classification model for univariate continuous time series signals via dynamic fusion of multi-dimensional cross-domain features","authors":"Rui Zhang, Zheqi Rong, Zehua Dong","doi":"10.1016/j.neucom.2025.131686","DOIUrl":"10.1016/j.neucom.2025.131686","url":null,"abstract":"<div><div>Improvements in classifying continuous, univariate time series signals with low value density have been hindered by insufficient feature detail, inadequate modeling of cross-domain dynamics, and inefficient parameter optimization. To address this, we propose a self-learning Cross-domain Adaptive Multi-dimensional Fusion Graph Neural Network (CAMGnet). We construct a Temporal Synergetic Pyramid (TSP) module to hierarchically extract time domain features from short-term to long-term trends. We develop an Entropy-adaptive Graph Construction (EAGC) mechanism to model cross-domain feature correlations. EAGC dynamically infers implicit feature-space/graph-topology relationships using self-adaptive adjacency matrices, minimizing reliance on prior knowledge and enabling autonomous, data-driven discovery of cross-domain interactions. Graph Convolutional Network–Graph Isomorphism Network(GCN-GIN) based hybrid encoding facilitates deep collaborative optimization between feature spaces and graph topologies. We also develop a Competitive Cross-attention (CCA) fusion mechanism to perform competitive multi-modal feature selection, enabling temporal/multi-domain graphs to capture cross-modal dependencies. Furthermore, we propose an Adaptive Perturbation Dynamic Escape Exploration–Exploitation Co-evolutionary Pool Strategy (AEP-IVYA) for improving the Ivy Optimization Algorithm.Adaptive perturbation balances exploration-exploitation by dynamically adjusting perturbation parameters. The dynamic escape strategy introduces a cross-dimensional transition mechanism to overcome local optima. The co-evolutionary pool uses a dual-path architecture to optimize global diversity and convergence. Evaluated on weld seam defect diagnosis and UCR datasets, AEP-IVYA improved hyperparameter configuration reliability. The self-optimized CAMGnet achieved 98.7 % accuracy in weld defect classification, surpassing traditional methods by 8.1 percentage points. On 25 UCR datasets, CAMGnet achieved 13 optimal and 6 suboptimal results, with a Wilcoxon-test average rank of 2.0, demonstrating significant generalization and domain applicability advantages over mainstream models.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"657 ","pages":"Article 131686"},"PeriodicalIF":6.5,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Encryption-decryption-based distributed state estimation against eavesdropping attacks over sensor networks with communication protocol 基于通信协议的传感器网络防窃听攻击的加解密分布式状态估计

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-27 DOI: 10.1016/j.neucom.2025.131570

Xiaolong Yang , Wen Chen , Hongxu Zhang , Jiawen Zhang , Yuxin Guo

{"title":"Encryption-decryption-based distributed state estimation against eavesdropping attacks over sensor networks with communication protocol","authors":"Xiaolong Yang , Wen Chen , Hongxu Zhang , Jiawen Zhang , Yuxin Guo","doi":"10.1016/j.neucom.2025.131570","DOIUrl":"10.1016/j.neucom.2025.131570","url":null,"abstract":"<div><div>The secure distributed state estimation problem is investigated for a class of discrete time-varying systems over sensor networks regulated by encryption–decryption mechanism and round-robin protocol. To save energy and alleviate network congestion, the round-robin protocol is introduced to schedule the transmission order of the measurement data. To mitigate privacy leakage, an encryptor is designed to encrypt the measurement information of each sensor node, and then the encrypted measurements can be decrypted by the user. The primary objective of this paper is to present a distributed state estimation algorithm with recursive format for such time-varying systems, in which an upper bound on the estimation error covariance is derived, and appropriate estimator gains are determined to minimize this upper bound. In addition, a sufficient condition is provided to ensure that the estimation error of the user is exponentially bounded in the mean-square sense. Particularly, the properly designed encryption–decryption parameters guarantee that the state estimation error of the eavesdropper is unbounded. Finally, two simulation experiments are conducted to demonstrate the feasibility of the developed encryption–decryption-based distributed state estimation algorithm.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131570"},"PeriodicalIF":6.5,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145236298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FACET–VLM: Facial emotion learning with text-guided multiview fusion via vision-language model for 3D/4D facial expression recognition face - vlm：基于文本引导的多视角融合的面部情感学习，基于视觉语言模型，用于3D/4D面部表情识别

IF 6.5 2区计算机科学

Neurocomputing Pub Date : 2025-09-27 DOI: 10.1016/j.neucom.2025.131621

Muzammil Behzad

{"title":"FACET–VLM: Facial emotion learning with text-guided multiview fusion via vision-language model for 3D/4D facial expression recognition","authors":"Muzammil Behzad","doi":"10.1016/j.neucom.2025.131621","DOIUrl":"10.1016/j.neucom.2025.131621","url":null,"abstract":"<div><div>Facial expression recognition (FER) in 3D and 4D domains presents a significant challenge in affective computing due to the complexity of spatial and temporal facial dynamics. Its success is crucial for advancing applications in human behavior understanding, healthcare monitoring, and human-computer interaction. In this work, we propose FACET–VLM, a vision–language framework for 3D/4D FER that integrates multiview facial representation learning with semantic guidance from natural language prompts. FACET–VLM introduces three key components: Cross-View Semantic Aggregation (CVSA) for view-consistent fusion, Multiview Text-Guided Fusion (MTGF) for semantically aligned facial emotions, and a multiview consistency loss to enforce structural coherence across views. Our model achieves state-of-the-art accuracy across multiple benchmarks, including BU-3DFE, Bosphorus, BU-4DFE, and BP4D-Spontaneous. We further extend FACET–VLM to 4D micro-expression recognition (MER) on the 4DME dataset, demonstrating strong performance in capturing subtle, short-lived emotional cues. FACET–VLM achieves up to 99.41 % accuracy on BU-4DFE and outperforms prior methods by margins as high as 15.12 % in cross-dataset evaluation on BP4D. The extensive experimental results confirm the effectiveness and substantial contributions of each individual component within the framework. Overall, FACET–VLM offers a robust, extensible, and high-performing solution for multimodal FER in both posed and spontaneous settings.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"657 ","pages":"Article 131621"},"PeriodicalIF":6.5,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145222616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0