{"title":"One-shot adaptation for cross-domain semantic segmentation in remote sensing images","authors":"Jiaojiao Tan , Haiwei Zhang , Ning Yao , Qiang Yu","doi":"10.1016/j.patcog.2025.111390","DOIUrl":"10.1016/j.patcog.2025.111390","url":null,"abstract":"<div><div>Contemporary cross-domain remote sensing (RS) image segmentation has been successful in recent years. When the target domain data becomes scarce in some realistic scenarios, the performance of traditional domain adaptation (DA) methods significantly drops. In this paper, we tackle the problem of fast cross-domain adaptation by observing only one unlabeled target data. To deal with dynamic domain shift efficiently, this paper introduces a novel framework named Minimax One-shot AdapTation (<strong>MOAT</strong>) to perform cross-domain feature alignment in semantic segmentation. Specifically, MOAT alternately maximizes the cross-entropy to select the most informative source samples and minimizes the cross-entropy of obtained samples to make the model fit the target data. The selected source samples can effectively describe the target data distribution using the proposed uncertainty-based distribution estimation technique. We propose a memory-based feature enhancement strategy to learn domain-invariant decision boundaries to accomplish semantic alignment. Generally, we empirically demonstrate the effectiveness of the proposed MOAT. It achieves a new state-of-the-art performance on cross-domain RS image segmentation for conventional unsupervised domain adaptation and one-shot domain adaptation scenarios.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111390"},"PeriodicalIF":7.5,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143348360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amran Bhuiyan , Aijun An , Jimmy Xiangji Huang , Jialie Shen
{"title":"Optimizing domain-generalizable ReID through non-parametric normalization","authors":"Amran Bhuiyan , Aijun An , Jimmy Xiangji Huang , Jialie Shen","doi":"10.1016/j.patcog.2025.111356","DOIUrl":"10.1016/j.patcog.2025.111356","url":null,"abstract":"<div><div>Optimizing deep neural networks to generalize effectively across diverse visual domains remains a key challenge in computer vision, especially in domain-generalizable person re-identification (ReID). The goal of domain-generalizable ReID is to develop robust deep learning (DL) models that are effective across both known (source) and unseen (target) domains. However, many top-performing ReID methods overfit to the source domain, impairing their generalization ability. Previous approaches have employed Instance Normalization (IN) with learnable parameters to generalize domains and eliminate source domain styles. Recently, some DL frameworks have adopted normalization techniques without learnable parameters. We critically examine non-parametric normalization techniques for optimizing the deep ReID model, emphasizing the advantages of using non-parametric instance normalization as a gating mechanism to extract style-independent features at various abstraction levels within both convolutional neural networks (CNNs) and Vision Transformers (ViT). Our framework offers strategic guidance on the optimal placement of non-parametric IN within the network architecture to ensure effective information flow management in subsequent layers. Additionally, we employ one-dimensional Batch Normalization (BN) without learnable parameters at deeper network levels to remove content-related biases from the source domain. Our integrated approach, termed <em>DualNormNP</em>, systematically optimizes the model’s capacity to generalize across varied domains. Comprehensive evaluations on multiple benchmark ReID datasets demonstrate that our approach surpasses current state-of-the-art ReID methods in terms of generalization performance. Code is available on Github: <span><span>https://github.com/mdamranhossenbhuiyan/DualNormNP</span><svg><path></path></svg></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111356"},"PeriodicalIF":7.5,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lin Guo , Xiaoqing Luo , Yue Liu , Zhancheng Zhang , Xiaojun Wu
{"title":"SAM-guided multi-level collaborative Transformer for infrared and visible image fusion","authors":"Lin Guo , Xiaoqing Luo , Yue Liu , Zhancheng Zhang , Xiaojun Wu","doi":"10.1016/j.patcog.2025.111391","DOIUrl":"10.1016/j.patcog.2025.111391","url":null,"abstract":"<div><div>The primary value of image fusion lies in supporting downstream task more effectively. However, the fusion representation of existing methods contains insufficient semantic information, thereby weakening the compatibility with subsequent task. To overcome this problem, a SAM-guided multi-level collaborative transformer for infrared and visible image fusion is proposed in this manuscript, termed as SpTFuse. Considering the strong zero-shot generalization ability of Segment Anything Model (SAM), a SAM-based semantic prior branch is introduced to interact with multi-scale visual representation branches for improving the completeness and compatibility of fusion representation. The interaction process is divided into three levels to progressively integrate multibranch information. At the first level, an inter-modal fusion block (IEB) is designed with a single-step collaborative transformer (SCT) and a modality integration module (MIM). The SCT aggregates the correlated features of semantic prior and visual representation. Then, the MIM is designed to fuse the SAM semantic prior guided multimodal visual representation. To balance the visual and semantic representations to obtain complete fusion representation, an intra-modal interaction block (IAB) is constructed at the following levels. Specifically, the IAB consists of a dual-path collaborative transformer (DCT) and a semantic enhancement module (SEM). The DCT constructs two paths in a cascade manner, where the prior collaborative path continues to acquire semantic prior, while the visual refinement path balances visual information while maintaining semantic completeness. Subsequently, SEM further combines semantic prior to enhance the completeness of the fused representation. To reduce the semantic information discarded during the image restoration process, the collaborative information of previous levels is incorporated into the corresponding decoder layers by the semantic compensation block. Finally, the proposed loss function includes semantic prior loss, gradient loss, and intensity loss. The experiments demonstrate the SpTFuse not only achieves effective fusion results, but also shows obvious advantages in downstream tasks such as segmentation and detection. The source code is available at <span><span>https://github.com/lxq-jnu/SpTFuse</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111391"},"PeriodicalIF":7.5,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura Morán-Fernández , Eva Blanco-Mallo , Konstantinos Sechidis , Verónica Bolón-Canedo
{"title":"Breaking boundaries: Low-precision conditional mutual information for efficient feature selection","authors":"Laura Morán-Fernández , Eva Blanco-Mallo , Konstantinos Sechidis , Verónica Bolón-Canedo","doi":"10.1016/j.patcog.2025.111375","DOIUrl":"10.1016/j.patcog.2025.111375","url":null,"abstract":"<div><div>As internet-of-things (IoT) devices proliferate, the need for efficient data processing at the network edge becomes increasingly critical due to the vast amounts of data generated. This paper presents a groundbreaking approach that leverages edge computing to address these challenges, using low-precision conditional mutual information (CMI) for feature selection. Our novel methodology improves the efficiency of edge computing systems by significantly reducing memory and energy consumption while maintaining high accuracy. We adapt this approach to feature selection algorithms, specifically, conditional mutual information maximization (CMIM) and incremental association Markov blanket (IAMB), and demonstrate its effectiveness for diverse datasets, including complex DNA microarrays. Our results show that low-precision methods not only compare competitively with traditional 64-bit implementations, but also yield significant performance and resource savings. For IoT and other machine learning applications, this work represents a significant advance in the development of more sustainable and efficient algorithms that can optimize computational resources and reduce their environmental impact.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111375"},"PeriodicalIF":7.5,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evolutionary structure learning on temporal networks using von Neumann entropy","authors":"Shenglong Liu , Yingyue Zhang , Qiyao Huang , Zhihong Zhang","doi":"10.1016/j.patcog.2025.111370","DOIUrl":"10.1016/j.patcog.2025.111370","url":null,"abstract":"<div><div>Temporal networks effectively represent diverse real-world dynamic systems, but their evolving nature poses challenges in developing robust models. Message-passing mechanisms in these models face increasing computational complexity as nodes process expanding neighborhoods over time. To address this issue, we introduce von Neumann entropy, an effective graph representation of static graph structure. By approximately computing von Neumann entropy on the temporal network, an <strong>E</strong>volutionary <strong>S</strong>tructure <strong>A</strong>ware <strong>N</strong>etwork (ESAN) framework is proposed for evolutionary structure recognition. ESAN leverages von Neumann entropy to identify and emphasize key structural changes, enabling insightful analysis of network evolution. Specifically, ESAN employs an evolutionary structure importance sampling algorithm to capture evolution laws by measuring von Neumann entropy changes. Relative structure information encoding further enhances edge structural information. Extensive evaluations on transductive and inductive link prediction tasks demonstrate the superiority of ESAN against state-of-the-art baselines.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111370"},"PeriodicalIF":7.5,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DBMSTN: A Dual Branch Multiscale Spatio-Temporal Network for dim-small target detection in infrared image","authors":"Na Li , Xiangyu Yang , Huijie Zhao","doi":"10.1016/j.patcog.2025.111372","DOIUrl":"10.1016/j.patcog.2025.111372","url":null,"abstract":"<div><div>Addressing the challenging task of infrared dim and small target (IDST) detection in complex background, which is a major topic in infrared image processing, we propose a Dual Branch Multiscale Spatio-Temporal Network (DBMSTN) to suppress complex background and effectively extract targets’ geometric and motion features. Firstly, DBMSTN utilizes a multiscale spatial feature extraction module that extracts inter-frame difference and saliency feature to highlight small targets at different scales and suppress complex backgrounds. Secondly, the DBMSTN contains a dual-branch spatio-temporal feature extraction module which is designed with improved gating unit in convolutional LSTM (ConvLSTM) to enhance the extraction of motion features to cope with their uncertainty. In addition, DBMSTN achieves a better performance using a fusing module that fuses multilevel spatio-temporal features. It also employs the weighted mean squared error (MSE) loss function with adjustable weights of positive and negative samples to solve the data imbalance problem. Experiments based on two public benchmarks verify that DBMSTN outperforms the state-of-the-art metrics and achieves the highest F1 up to 0.9860, also effectively extracts spatio-temporal features of targets with different speeds.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111372"},"PeriodicalIF":7.5,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lianyu Hu , Mudi Jiang , Junjie Dong , Xinying Liu , Zengyou He
{"title":"Interpretable categorical data clustering via hypothesis testing","authors":"Lianyu Hu , Mudi Jiang , Junjie Dong , Xinying Liu , Zengyou He","doi":"10.1016/j.patcog.2025.111364","DOIUrl":"10.1016/j.patcog.2025.111364","url":null,"abstract":"<div><div>Categorical data clustering algorithms are extensively investigated but it is still challenging to explain or understand their output clusters. Hence, it is highly demanded to develop interpretable clustering algorithms that are capable of explaining categorical clusters in terms of decision trees or rules. However, most existing interpretable clustering algorithms focus on numeric data and the development of corresponding algorithms for categorical data is still in the infant stage. In this paper, we tackle the problem of interpretable categorical data clustering by growing a binary decision tree in an unsupervised manner. We formulate the candidate split evaluation issue as a multiple hypothesis testing problem, where the null hypothesis posits that there is no association between each attribute and the candidate split. Subsequently, the <span><math><mi>p</mi></math></span>-value for each candidate split is calculated by aggregating individual test statistics from all attributes. Thereafter, a significance-based splitting criteria is established. This involves choosing an optimal split with the smallest <span><math><mi>p</mi></math></span>-value for tree growth and using a significance level to stop the non-significant split. Extensive experimental results on real-world data sets demonstrate that our algorithm achieves comparable performance in terms of cluster quality and explainability relative to those of state-of-the-art counterparts.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111364"},"PeriodicalIF":7.5,"publicationDate":"2025-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruyi Liu , Yi Liu , Mengyao Wu , Wentian Xin , Qiguang Miao , Xiangzeng Liu , Long Li
{"title":"SG-CLR: Semantic representation-guided contrastive learning for self-supervised skeleton-based action recognition","authors":"Ruyi Liu , Yi Liu , Mengyao Wu , Wentian Xin , Qiguang Miao , Xiangzeng Liu , Long Li","doi":"10.1016/j.patcog.2025.111377","DOIUrl":"10.1016/j.patcog.2025.111377","url":null,"abstract":"<div><div>Contrastive learning and multimodal representation learning have been widely applied to skeleton-based action recognition. However, the majority of the research focuses on the mining of spatial- temporal features while ignoring the semantic information of action. To deal with these drawbacks, we propose a novel contrastive learning framework (SG-CLR) for skeleton-based action recognition, which captures fine-grained multi-level discriminative features by incorporating both semantic compensation and spatial–temporal feature reinforcement. For semantic compensation contrastive learning, in order to achieve dynamic compensation of high-order semantic information, combining LLMs-generated action descriptions with multi-modal encoders to integrate cross-modal multivariate features (<em>e.g.,</em> skeleton and text features). For spatial–temporal enhancement contrastive learning, SkeleMask augmentation is proposed to mine more high-level temporal movement information. Experiments demonstrate that the proposed SG-CLR achieves the state-of-the-art performance on the NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets. Related code will be available at <span><span>https://github.com/QingZhiWMY/SG-CLR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111377"},"PeriodicalIF":7.5,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A vision and language hierarchical alignment for multimodal aspect-based sentiment analysis","authors":"Wang Zou, Xia Sun, Qiang Lu, Xuxin Wang, Jun Feng","doi":"10.1016/j.patcog.2025.111369","DOIUrl":"10.1016/j.patcog.2025.111369","url":null,"abstract":"<div><div>In recent years, Multimodal Aspect-Based Sentiment Analysis (MABSA) has garnered attention from researchers. The MABSA technology can effectively perform Aspect Term Extraction (MATE) and Aspect Sentiment Classification (MASC) for Multimodal data. However, current MABSA work focuses on visual semantic information while neglecting the scene structure of images. Additionally, researchers using static alignment matrices cannot effectively capture complex vision features, such as spatial and action features among objects. In this paper, we propose a Vision and Language Hierarchical Alignment method (VLHA) for the MABSA task. The VLHA framework includes three modules: the multimodal structural alignment module, the multimodal semantic alignment module, and the cross-modal MABSA module. Firstly, we process the vision modality into a visual scene graph and image patches, and the text modality into a text dependency graph and word sequences. Secondly, we use the structural alignment module to achieve dynamic alignment learning between the visual scene graph and text dependency graph, and the semantic alignment module to achieve dynamic alignment learning between image patches and word sequences. Finally, we concatenate and fuse structural and semantic features in the cross-modal MABSA module. Additionally, VLHA designs a three-dimensional dynamic alignment matrix to guide the cross-attention for modal interaction learning. We conducted a series of experiments on two Twitter datasets, and the results show that the performance of the VLHA framework outperforms the baseline models. The structure of the visual modality facilitates the model in comprehensively understanding complex visual information.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111369"},"PeriodicalIF":7.5,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chang Liu , Hongbing Zhang , Hongtao Fan , Yajing Li
{"title":"Tensorial multi-view subspace clustering based on logarithmic n-order penalty","authors":"Chang Liu , Hongbing Zhang , Hongtao Fan , Yajing Li","doi":"10.1016/j.patcog.2025.111384","DOIUrl":"10.1016/j.patcog.2025.111384","url":null,"abstract":"<div><div>The low-rank tensor can demonstrate the internal features of the data and explore the higher-order correlations between multi-view data, and has been successfully used in multi-view clustering. However, most of the existing methods use the tensor nuclear norm (TNN) as a convex approximation of the tensor rank function. The TNN approach applies the same weight information to all singular values, which usually leads to a sub-optimal tensor representation. In this paper, a new non-convex logarithmic <span><math><mi>n</mi></math></span>-order penalty (LNP) function is designed, which can fully take into account the differences existing between different singular values. What is more, a tensorial multi-view subspace clustering model about LNP (LNP-TMSC) is proposed. The norm based on LNP can integrate the structural information encoded by larger singular values. Thus a more compact tensor with LR properties can be learned that can fully preserve the consistency between views and explore the high-order correlation. The proposed model is solved using the alternating direction method of multipliers and the resulting algorithm is proved theoretically to converge to a Karush–Kuhn–Tucker point. The experimental results demonstrate the validity and superiority of the LNP-TMSC model.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"162 ","pages":"Article 111384"},"PeriodicalIF":7.5,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}