{"title":"Dynamic patch-aware enrichment transformer for occluded person re-identification","authors":"Xin Zhang , Keren Fu , Qijun Zhao","doi":"10.1016/j.knosys.2025.114193","DOIUrl":"10.1016/j.knosys.2025.114193","url":null,"abstract":"<div><div>Person re-identification (re-ID) continues to pose a significant challenge, particularly in scenarios involving occlusions. Prior approaches aimed at tackling occlusions have predominantly focused on aligning physical body features through the utilization of external semantic cues. However, these methods tend to be intricate and susceptible to noise. To address the aforementioned challenges, we present an innovative end-to-end solution known as the Dynamic Patch-aware Enrichment Transformer (DPEFormer). This model effectively distinguishes human body information from occlusions automatically and dynamically, eliminating the need for external detectors or precise image alignment. Specifically, we introduce a dynamic patch token selection module (DPSM). DPSM utilizes a label-guided proxy token as an intermediary to identify informative occlusion-free tokens. These tokens are then selected for deriving subsequent local part features. To facilitate the seamless integration of global classification features with the finely detailed local features selected by DPSM, we introduce a novel feature blending module (FBM). FBM enhances feature representation through the complementary nature of information and the exploitation of part diversity. Furthermore, to ensure that DPSM and the entire DPEFormer can effectively learn with only identity labels, we also propose a Realistic Occlusion Augmentation (ROA) strategy. This strategy leverages the recent advances in the Segment Anything Model (SAM) [1]. As a result, it generates occlusion images that closely resemble real-world occlusions, greatly enhancing the subsequent contrastive learning process. Experiments on occluded and holistic re-ID benchmarks signify a substantial advancement of DPEFormer over existing state-of-the-art approaches. The code is publicly available at <span><span>https://github.com/zhangxin06/DPEFormer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"327 ","pages":"Article 114193"},"PeriodicalIF":7.6,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144771176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Baochen Yao , Dongjie Zhang , Jie Zhao , Ye Zheng , Chengbin Peng
{"title":"Active learning with joint probabilistic modeling for point cloud semantic segmentation","authors":"Baochen Yao , Dongjie Zhang , Jie Zhao , Ye Zheng , Chengbin Peng","doi":"10.1016/j.knosys.2025.114171","DOIUrl":"10.1016/j.knosys.2025.114171","url":null,"abstract":"<div><div>With advancements in sensing technologies, the demand for point cloud semantic segmentation has grown significantly across various applications, while current deep learning-based methods rely heavily on costly, well-annotated datasets. Recently, label-efficient learning strategies have been explored to reduce annotation demands, with active learning emerging as a preferred approach by selectively annotating only the most informative samples. However, existing point cloud active learning methods often depend solely on neural network softmax scores for sample selection, which can introduce bias and be affected by overconfidence in network predictions. To overcome this limitation, we propose an active learning framework with Joint Probabilistic modeling (JoPro), aiming to select unlabeled points that can provide more post-annotation information. At the core of JoPro is a novel probabilistic model that efficiently captures the distribution of embedded features to generate richer probabilistic representations for unlabeled data. Utilizing this probabilistic modeling, we propose a feature mixing stability metric to identify uncertain points near decision boundaries, ensuring more informative sample selection. Furthermore, a cluster-aware hybrid contrastive regularization method is incorporated to maximize the utilization of unlabeled data to enhance training of the segmentation model. Our proposed active learning framework achieves competitive results on popular benchmarks, delivering near fully supervised performance with only 1 % of the annotation budget.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"328 ","pages":"Article 114171"},"PeriodicalIF":7.6,"publicationDate":"2025-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144772768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LaF-transformer: Leveraging coupling text-graph embedding for semantic calibration in textual entailment","authors":"Shaokang Wang , Li Pan","doi":"10.1016/j.knosys.2025.114180","DOIUrl":"10.1016/j.knosys.2025.114180","url":null,"abstract":"<div><div>Textual Entailment (TE) aims to recognize the semantic relationships between premises and hypotheses, a fundamental task in Natural Language Inference (NLI). Given the growing complexity of inference information, Transformer-based methods have explored concatenating various structural and semantic information for further improvement. The semantics of input information vary in different sources, leading to decreased performance due to negative generalization from semantic inconsistencies in straightforward concatenation. To address this issue, we proposed a novel Transformer-based method named Latent Fusion Transformer (LaFT). LaFT leverages latent topic information to couple the text and graph information, calibrating the semantic knowledge in TE. The latent topic information extracted from texts is employed to sample the graph information to construct text-graph embedding for rational concatenation of premises and hypotheses. The attention mechanism is often employed as a black-box module, lacking the explicit design for capturing the consistent semantics between inputs. To calibrate the semantics of inputs for TE, LaFT utilizes an attention scaling matrix derived from latent topic similarity to guide the attention allocation in the training. Considering the coupling of text-graph embedding and the calibration of semantics, LaFT improves the rational concatenation of premises and hypotheses in TE, offering a promising advancement in NLI. Extensive experiments are conducted on five public datasets with accuracy and macro F1-score evaluation metrics. On average, LaFT outperforms the state-of-the-art baselines in extensive experiments.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"327 ","pages":"Article 114180"},"PeriodicalIF":7.6,"publicationDate":"2025-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sample efficient reinforcement learning via low-rank regularization","authors":"Jiamin Liu , Heng Lian","doi":"10.1016/j.knosys.2025.114176","DOIUrl":"10.1016/j.knosys.2025.114176","url":null,"abstract":"<div><div>In this paper, the usefulness of low-rankness in state-action value function estimation is demonstrated using a simplified setup that is amenable to theoretical analysis. First, the concept of low-rank functions is defined motivated by standard functional analysis results. Subsequently, a specific procedure is proposed based on nuclear-norm penalized series estimation, in which the estimation of the low-rank function naturally leads to estimation of a low-rank matrix. Risk bounds are established for the estimator, which shows faster convergence rates compared to the standard estimator without using low-rankness. Several simulated toy examples are used as proof of concept to demonstrate the performances in simulations.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"327 ","pages":"Article 114176"},"PeriodicalIF":7.6,"publicationDate":"2025-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144724990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yibiao Hu, You Zhou, Zhengqiang Zhu, Xi Yang, Han Zhang, Kun Bian, Hong Han
{"title":"LLVM-drone: A synergistic framework integrating large language models and vision models for visual tasks in unmanned aerial vehicles","authors":"Yibiao Hu, You Zhou, Zhengqiang Zhu, Xi Yang, Han Zhang, Kun Bian, Hong Han","doi":"10.1016/j.knosys.2025.114190","DOIUrl":"10.1016/j.knosys.2025.114190","url":null,"abstract":"<div><div>The integration of Large Language Models (LLMs) and Visual Language Models (VLMs) with drone technology holds significant potential for enhancing Unmanned Aerial Vehicle (UAV) capabilities. However, a critical challenge arises from the inherent uncertainty and hallucination tendencies of LLMs when processing ambiguous natural language instructions and generating executable code. These limitations make LLMs unreliable for direct deployment in UAVs, particularly in vision-based tasks where precision and safety are paramount. To address these challenges, we propose LLVM-Drone, a novel framework that combines Domain-Guided Structured Prompt Execution Framework (DGSPEF) with lightweight task-specific vision models to ensure accurate code generation and reliable visual feedback. DGSPEF leverages structured prompts and domain knowledge to mitigate LLM hallucinations, translating user intent into precise executable commands, while lightweight vision models provide real-time perceptual validation. This approach enables zero-shot visual task execution without additional training, maintaining a separation between language understanding and visual processing. Extensive evaluations involving eight state-of-the-art LLMs demonstrate the effectiveness of LLVM-Drone across a wide range of UAV vision missions, including aerial object detection, precise localization, object tracking, and vision-language navigation. The framework has been successfully deployed on real UAV hardware for basic tasks, indicating its strong potential for future applications in real-world scenarios such as autonomous mapping, disaster response, precision agriculture, and infrastructure inspection.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"327 ","pages":"Article 114190"},"PeriodicalIF":7.6,"publicationDate":"2025-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Jiang , Yao Xiao , Gang Zhou , Guofang Liu , Zhen Li , Jia Luo , Kailin He , Shishi Liao
{"title":"Integrating radius margin constraints and class variance for improved CNN-based image recognition","authors":"Qi Jiang , Yao Xiao , Gang Zhou , Guofang Liu , Zhen Li , Jia Luo , Kailin He , Shishi Liao","doi":"10.1016/j.knosys.2025.114188","DOIUrl":"10.1016/j.knosys.2025.114188","url":null,"abstract":"<div><div>Image recognition aims to classify images by learning predictive models, with Convolutional Neural Network (CNN) emerging as the dominant approach. While SVM-driven CNNs, which employ Support Vector Machine (SVM) as the energy function, outperform traditional Softmax-driven CNNs in generalization, they overlook the influence of the Minimum Enclosing Ball (MEB) radius on the generalization bounds and fail to utilize the overall sample distribution, limiting their performance.</div><div>To address these issues, we propose two models: (1) RMB-driven CNN, which incorporates the Radius Margin Bound (RMB) to optimize feature learning by maximizing inter-class margins while minimizing the MEB radius; (2) MCVSVM-driven CNN, which integrates Minimum Class Variance Support Vector Machine (MCVSVM) and Fisher’s discriminant theory to refine hyperplanes using sample distribution, enhancing feature discriminability.</div><div>Experiments on the FER2013, CIFAR-10, CIFAR-100, MNIST, and SVHN datasets using AlexNet, VGGNet, and ResNet demonstrate that the proposed models achieve superior feature extraction and recognition accuracy compared to existing methods, validating their effectiveness and robustness.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"327 ","pages":"Article 114188"},"PeriodicalIF":7.6,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144758015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sliced Wasserstein weighted multimodal mambavision for emotion recognition","authors":"Hao Wang , Li Xu , Weiyue Ding , Yiming Xu","doi":"10.1016/j.knosys.2025.114182","DOIUrl":"10.1016/j.knosys.2025.114182","url":null,"abstract":"<div><div>In the current field of physiological signal-based affective computing, the capture of both local and global information from single-modal signals, as well as the effective fusion of multimodal signals, still face significant challenges. Recently, Mamba-based models have attracted widespread attention due to their linear complexity and exceptional long-sequence modeling capabilities, yet existing Mamba-based models are primarily designed for single-modal tasks. This study introduces the Sliced Wasserstein Weighted Multimodal (SWWM) MambaVision, a novel multimodal fusion model designed to achieve more effective multimodal integration by leveraging the correlations and complementarities of physiological signals. This model inherits the high computational efficiency of the Mamba’s State Space Model (SSM) and integrates the cross-window connection mechanism to capture the global information of single-modal physiological signals. Furthermore, this study innovatively constructs a dual-stream structure framework to achieve the fusion of multimodal signals. Simultaneously, a weighting mechanism based on the Sliced-Wasserstein (SW) distance is proposed, which fully utilizes the manifold structural features of physiological signals to calculate the distance metric of modal feature matrices, achieving a more flexible and effective multimodal fusion. The method was validated on the Deap and Dreamer datasets, achieving average accuracies of 98.99 % and 97.58 %, respectively. The throughput was improved by 84 %, 22 %, and 8 % compared to Conv-Based, Transformer-Based, and Conv-Transformer-Based multimodal models, respectively. The results thoroughly demonstrate its performance advantages in multimodal physiological signal processing and open up new directions for further research in this field.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"327 ","pages":"Article 114182"},"PeriodicalIF":7.6,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144771178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A willingness-aware session-based social recommendation method with heterogeneous global graph embedding","authors":"Xiongtao Zhang , Jianmin Xu","doi":"10.1016/j.knosys.2025.114181","DOIUrl":"10.1016/j.knosys.2025.114181","url":null,"abstract":"<div><div>Recent studies on Session-based Social Recommendation (SSR) employed the social information to enhance the session interest and achieve the next item recommendation. However, existing studies exhibit two main limitations: first, the learning of session interests overlooks the collaborative signals inherent in user-item interactions; second, the enhancement of session interests fails to consider the willingness of the target user on adopting his or her friends’ interests. Toward this end, we develop a willingness-aware SSR method with heterogeneous global graph embedding, short for WSSR. Specifically, we construct the heterogeneous global graph based on item transitions as well as user-item interactions, and then introduce the heterogeneous graph neural network to learn more sufficient embeddings of uses and items. Moreover, we construct the dynamic social graph for the target user and develop a novelty willingness-aware graph attention network to consider the willingness of the target user in enhancing the session interest. Extensive experiments are conducted on three real-world datasets in different domains, and empirical results verify that WSSR outperforms eight mainstream baselines on multiple measurements.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"328 ","pages":"Article 114181"},"PeriodicalIF":7.6,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144772699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongjun Wang , Yizhao Zhu , Hao Wang , Ru Wang , Huajuan Duan , Lei Guo , Peiyu Liu
{"title":"Meta-DDA: Meta-learning with diffusion and dual augmentation for few-shot text classification","authors":"Yongjun Wang , Yizhao Zhu , Hao Wang , Ru Wang , Huajuan Duan , Lei Guo , Peiyu Liu","doi":"10.1016/j.knosys.2025.114179","DOIUrl":"10.1016/j.knosys.2025.114179","url":null,"abstract":"<div><div>Few-shot learning can construct neural architectures endowed with rapid task adaptation capabilities under limited labeled data regimes while preserving the generalization efficacy across distribution shifts. Meta-learning frameworks that adopt bi-level optimization paradigms have emerged as predominant solutions for few-shot learning problems owing to their architectural parsimony and parameter efficiency. However, inherent hierarchical optimization dynamics (outer-loop meta-optimization over task-specific loss landscapes and inner-loop, gradient-based task adaptation) cause computational pathology via second-order gradient backpropagation across inner-loop trajectories. This induces sensitivity degradation in parameter initialization and gradient propagation instability, particularly under cross-task distributional disparities. To address these limitations, we propose Meta-DDA, a novel meta-learning framework that substitutes conventional gradient descent in the inner-loop with diffusion-based denoising trajectories. Meta-DDA effectively circumvents the numerical instability of traditional meta-learning, second-order gradient backpropagation by reconstructing inner-loop gradient optimization as a denoising trajectory of the task conditions. This significantly reduces the sensitivity to initialization by utilizing noise scheduling with progressive parameter updating of Gaussian prior to achieve more stable and robust optimization in scenarios with few samples. Furthermore, we develop dual data augmentation strategies that are compatible with the bi-level architecture: (1) task-level augmentor at the meta-level stage mitigates excessive parameter updates caused by task difficulty variance; (2) sample-level augmentor at the base learner stage augments task-specific feature learning. Extensive experiments on four text classification datasets and four intent recognition datasets demonstrate the superior performance of Meta-DDA with markedly improved cross-domain generalization.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"327 ","pages":"Article 114179"},"PeriodicalIF":7.6,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144757871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruiyu Wang , Wenxie Lin , Gang Ren , Qi Cao , Zhe Zhang , Yue Deng
{"title":"Interaction-aware vehicle trajectory prediction using spatial-temporal dynamic graph neural network","authors":"Ruiyu Wang , Wenxie Lin , Gang Ren , Qi Cao , Zhe Zhang , Yue Deng","doi":"10.1016/j.knosys.2025.114187","DOIUrl":"10.1016/j.knosys.2025.114187","url":null,"abstract":"<div><div>In the context of autonomous driving, vehicle trajectory prediction (VTP) plays a crucial role in enhancing safety and efficiency. Equipped with perception and communication devices, autonomous vehicles (AVs) can obtain trajectory information of surrounding vehicles, enabling more accurate trajectory predictions. Although recent interactive methods have achieved significant progress by modelling the interactions among neighboring vehicles, the VTP task still remains a challenging research issue due to the dynamic and heterogeneous nature of vehicle interaction. To address this issue, we propose a novel interaction-aware VTP model (IA-STDGNN) to simultaneously predict the trajectories of both target and surrounding vehicles. Specifically, we begin by designing a feature extraction module to extract vehicle motion state information through trajectory residual and discrete derivative operations. Next, we introduce instance normalization and linear integration modules to normalize the input data and deduce trend trajectories. Afterward, a vehicle interaction-based dynamic graph convolutional network is developed, incorporating single-lane and multi-lane vehicle interaction mechanisms to account for spatial interactions between vehicles. Building on this, a spatial-temporal feature dependency fusion module is designed to enhance the model's spatiotemporal representation capabilities further and effectively integrate spatial and temporal features. Finally, the trajectory prediction module produces multi-modal predictions, concatenating the output of the linear model to generate the final predicted trajectory. Extensive experiments conducted on the public datasets demonstrate that our method outperforms other state-of-the-art VTP approaches.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"327 ","pages":"Article 114187"},"PeriodicalIF":7.6,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144770672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}