Siqi Liang , Wenxuan Liu , Zhe Li , Kui Jiang , Siyuan Yang , Chia-Wen Lin , Xian Zhong
{"title":"AM40: Enhancing action recognition through matting-driven interaction analysis","authors":"Siqi Liang , Wenxuan Liu , Zhe Li , Kui Jiang , Siyuan Yang , Chia-Wen Lin , Xian Zhong","doi":"10.1016/j.patcog.2025.112393","DOIUrl":"10.1016/j.patcog.2025.112393","url":null,"abstract":"<div><div>Action recognition models frequently face challenges from complex video backgrounds, where actors may blend into their surroundings and complicate motion analysis. Human interactions with action-related elements vary across scenarios, with backgrounds serving as both contextual cues and sources of interference. To address these issues, we introduce video matting techniques to separate foreground subjects from the background. This enables the model to focus on the subject of interest while suppressing irrelevant regions, thereby enhancing the extraction of interactions between the subject and associated objects. To support this methodology, we present <span>ActionMatting40</span> (<span>AM40</span>) dataset, which comprises 40 action categories annotated with alpha mattes to distinguish human actions and related objects from the background. Furthermore, we propose Matting-Driven Interaction Recognition (MIR), integrating an Action Background Decoupling (ABD) module to mitigate background interference and a Semantic-aware Feature Communication (SFC) module to selectively extract informative features for improved action recognition. Our code and dataset are publicly available at <span><span>https://github.com/lwxfight/actionmatting</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112393"},"PeriodicalIF":7.6,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145109767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Decoding the brain via multi-view brain topology contrastive learning","authors":"Ziyu Li , Zhiyuan Zhu , Qing Li , Xia Wu","doi":"10.1016/j.patcog.2025.112445","DOIUrl":"10.1016/j.patcog.2025.112445","url":null,"abstract":"<div><div>Recently, Graph Neural Networks (GNNs) have been widely used in neural decoding due to strong topological feature mining and interpretability. GNNs are heavily based on manually defined brain topology; if there are false connections or noise, it will greatly affect the decoding performance. To address the aforementioned challenges, a series of GNN-based graph topology learning (GTL) methods have received widespread attention due to their ability to automatically optimize brain topology. However, existing GTL methods are usually implemented in a supervised manner and rely on a large amount of annotated data, making it difficult to directly transfer them to different decoding scenarios. Therefore, in this paper, a Brain Topology Inference framework based on Multi-View Contrastive Self-supervised Learning (BTI-MVCSL) is proposed for neural decoding. Specifically, BTI-MVCSL first designs a series of graph learners, which can infer brain topological connections as “learner”, generate topology learning objectives as “instructor” from the original fMRI data, and maximize consistency between “instructor” and “learner” to extract the rich information in hidden connections. Furthermore, in order to achieve fully automated topology learning guidance, BTI-MVCSL develops a new self-learning mechanism that can use the “learner”-view brain topology to update the “instructor”-view brain topology during model optimization and further achieves comparative constraints through the “instructor” topology. The proposed BTI-MVCSL has been extensively evaluated in two publicly available fMRI datasets, demonstrating superior performance and revealing potential changes in brain topology under different decoding tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112445"},"PeriodicalIF":7.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A simple yet lightweight module for enhancing domain generalization through relative representation","authors":"Meng Cao , Songcan Chen","doi":"10.1016/j.patcog.2025.112423","DOIUrl":"10.1016/j.patcog.2025.112423","url":null,"abstract":"<div><div>Domain Generalization (DG) learns a model from multiple source domains to combat individual domain differences and ensure generalization to unseen domains. Most existing methods focus on learning domain-invariant <em>absolute</em> representations. However, we empirically observe that such representations often suffer from notable distribution divergence, leading to unstable performance in diverse unseen domains. In contrast, <em>relative</em> representations, constructed w.r.t. a set of anchors, naturally capture geometric relationships and exhibit intrinsic stability within a dataset. Despite this potential, their application to DG remains largely unexplored, due to their common transductive assumption that anchors require access to target-domain data, which is incompatible with the inductive setting of DG. To address this issue, we design Re2SL, a simple and lightweight plug-in module that follows a pre-trained encoder and constructs anchors solely from source-domain prototypes, thereby ensuring a completely inductive design. To our knowledge, Re2SL is the first to explore relative representation for DG. This design is inspired by the insight that <strong>ReS</strong>idual differences between absolute and domain-specific representations can spontaneously seek stable representations within the same distribution shared across <em>all domains</em>. Leveraging these stable representations, we construct cross-domain <strong>ReL</strong>ative representation to enhance stability and transferability without accessing any target data during training or anchor computation. Empirical studies show that our constructed representation exhibits minimal <span><math><mi>H</mi></math></span>-divergence, confirming its stability. Notably, Re2SL achieves up to 4.3 % improvement while reducing computational cost by 90 %, demonstrating its efficiency.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112423"},"PeriodicalIF":7.6,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Li , Xiangyu Zhai , Ziwei Liang , Jie Xue , Bin Jin , Haitao Niu , Guangyong Zhang , Huanxin Ding , Dengwang Li , Pu Huang
{"title":"Multi-frequency shared-feature-learning based diffusion model for removing surgical smoke","authors":"Hao Li , Xiangyu Zhai , Ziwei Liang , Jie Xue , Bin Jin , Haitao Niu , Guangyong Zhang , Huanxin Ding , Dengwang Li , Pu Huang","doi":"10.1016/j.patcog.2025.112447","DOIUrl":"10.1016/j.patcog.2025.112447","url":null,"abstract":"<div><div>Surgical smoke in laparoscopic surgery can deteriorate visibility for surgeons. This work aims to simultaneously remove the surgical smoke and restore true-to-life image colors with deep learning. However, deep learning-based smoke removal remains a challenge due to: 1) the non-homogeneous distribution of surgical smoke, 2) higher frequency modes being hindered from being learned due to spectral bias. In this work, we propose the multi-frequency shared-feature-learning based conditional diffusion model with adaptive smoke attention for removing surgical smoke. The proposed model learns to map both the smoky and smokeless images into a shared inherent feature by the forward learning and synthesize the smokeless image by the reverse learning, and the input noisy image used for the forward learning is wrapped by the smoke attention learning to ease sampling steps and facilitate shared feature optimization. The smoke attention learning employs smoke segmentation and convolutional block attention modules to capture the non-homogeneous features of smoke. The multi-frequency learning is introduced to incorporate with shared feature learning to enhance the mid-to-high frequency features. In addition, the multi-task learning incorporates shared feature loss, smoke perception loss, dark channel prior loss, and contrast enhancement loss to help the model optimization. The experimental results show that the proposed method outperforms other state-of-the-art methods on both synthetic/real laparoscopic surgical images, with the potential to be embedded in laparoscopic devices for de-smoking.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112447"},"PeriodicalIF":7.6,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145109810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Chen , Shian Du , Shigui Li , Delu Zeng , John Paisley
{"title":"Entropy-informed weighting channel normalizing flow for deep generative models","authors":"Wei Chen , Shian Du , Shigui Li , Delu Zeng , John Paisley","doi":"10.1016/j.patcog.2025.112442","DOIUrl":"10.1016/j.patcog.2025.112442","url":null,"abstract":"<div><div>Normalizing Flows (NFs) are widely used in deep generative models for their exact likelihood estimation and efficient sampling. However, they require substantial memory since the latent space matches the input dimension. Multi-scale architectures address this by progressively reducing latent dimensions while preserving reversibility. Existing multi-scale architectures use simple, static channel-wise splitting, limiting expressiveness. To improve this, we introduce a regularized, feature-dependent <span><math><mi>Shuffle</mi></math></span> operation and integrate it into vanilla multi-scale architecture. This operation adaptively generates channel-wise weights and shuffles latent variables before splitting them. We observe that such operation guides the variables to evolve in the direction of entropy increase, hence we refer to NFs with the <span><math><mi>Shuffle</mi></math></span> operation as <em>Entropy-Informed Weighting Channel Normalizing Flow</em> (EIW-Flow). Extensive experiments on CIFAR-10, CelebA, ImageNet, and LSUN demonstrate that EIW-Flow achieves state-of-the-art density estimation and competitive sample quality for deep generative modeling, with minimal computational overhead.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112442"},"PeriodicalIF":7.6,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145106064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianping Gou , Kaijie Chen , Cheng Chen , Weihua Ou , Xin Luo , Zhang Yi
{"title":"Layer-wise correlation and attention discrepancy distillation for semantic segmentation","authors":"Jianping Gou , Kaijie Chen , Cheng Chen , Weihua Ou , Xin Luo , Zhang Yi","doi":"10.1016/j.patcog.2025.112438","DOIUrl":"10.1016/j.patcog.2025.112438","url":null,"abstract":"<div><div>Knowledge distillation (KD) has recently garnered increased attention in segmentation tasks due to its effective balance between accuracy and computational efficiency. Nonetheless, existing methods mainly rely on structured knowledge from a single layer, overlooking the valuable discrepant knowledge that captures the diversity and distinctiveness of features across various layers, which is essential for the KD process. We present Layer-wise Correlation and Attention Discrepancy Distillation (LCADD) to tackle this issue, training compact and accurate semantic segmentation networks by considering layer-wise discrepancy knowledge. Specifically, we employ two distillation schemes: (i) correlation discrepancy distillation, which constructs a pixel-wise correlation discrepancy matrix across various layers to seize more detailed spatial dependencies, and (ii) attention discrepancy self-distillation, which aims to guide the shallower layers of the student network to emulate the attention discrepancy maps of the deeper layers, facilitating self-learning of attention discrepancy knowledge within the student network. Each proposed method is designed to work collaboratively in learning discrepancy knowledge, allowing the student network to better imitate the teacher from the perspective of layer-wise discrepancy. Our method has demonstrated superior performance on various semantic segmentation datasets, including Cityscapes, Pascal VOC 2012, and CamVid, compared to the latest knowledge distillation techniques, thereby validating its effectiveness.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112438"},"PeriodicalIF":7.6,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xi Yang , Wenjiao Dong , Yingzhi Tang , Gu Zheng , Nannan Wang , Xinbo Gao
{"title":"Condense loss: Exploiting vector magnitude during person Re-identification training process","authors":"Xi Yang , Wenjiao Dong , Yingzhi Tang , Gu Zheng , Nannan Wang , Xinbo Gao","doi":"10.1016/j.patcog.2025.112443","DOIUrl":"10.1016/j.patcog.2025.112443","url":null,"abstract":"<div><div>The magnitudes of features and weights significantly affect the gradients during the training process. L2 normalized softmax losses (such as NormFace, CosFace, ArcFace, etc.) and Naive softmax losses both reduce the magnitudes of image features in the training process and achieve good results in face recognition and person re-identification tasks, respectively. In this paper, we fully utilize the feature vector magnitudes and propose Condense loss for Re-ID tasks, which replaces the inner production of Naive softmax loss with the negative Euclidean distance. Condense loss generates negative radial gradients when updating weight parameters to push all features compacter. Because the coefficients of tangential gradients (the tangential component of the gradients) are related to feature magnitudes, it ideally provides monotonically decreasing tangential gradients, resulting in gradually diminishing updates that enhance the stability of the training process. We also introduce a margin parameter into Condense loss to enlarge inter-class distances and thus help the model learn more discriminative features. Mathematical analysis is given in this paper, and we have conducted sufficient experiments focusing on Re-ID tasks to prove the corresponding conclusion. The experimental results demonstrate that the Condense loss achieves competitive results compared to the state-of-the-art methods in the person re-identification task. At the same time, it also has a good performance in face recognition tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112443"},"PeriodicalIF":7.6,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Preference isolation forest for structure-based anomaly detection","authors":"Filippo Leveni , Luca Magri , Cesare Alippi , Giacomo Boracchi","doi":"10.1016/j.patcog.2025.112405","DOIUrl":"10.1016/j.patcog.2025.112405","url":null,"abstract":"<div><div>We address the problem of detecting anomalies as samples that do not conform to structured patterns represented by low-dimensional manifolds. To this end, we conceive a general anomaly detection framework called Preference Isolation Forest (<span>PIF</span>), that combines the benefits of adaptive isolation-based methods with the flexibility of preference embedding. The key intuition is to embed the data into a high-dimensional preference space by fitting low-dimensional manifolds, and to identify anomalies as isolated points. We propose three isolation approaches to identify anomalies: <em>i</em>) Voronoi-<span>iForest</span>, the most general solution, <em>ii</em>) <span>RuzHash</span>-<span>iForest</span>, that avoids explicit computation of distances via Local Sensitive Hashing, and <em>iii</em>) Sliding-<span>PIF</span>, that leverages a locality prior to improve efficiency and effectiveness.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112405"},"PeriodicalIF":7.6,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145106065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sulan Zhang , Zhenwen Liao , Jianeng Li , Lihua Hu , Jifu Zhang
{"title":"A progressive attention network with transformer for multi-label image recognition","authors":"Sulan Zhang , Zhenwen Liao , Jianeng Li , Lihua Hu , Jifu Zhang","doi":"10.1016/j.patcog.2025.112439","DOIUrl":"10.1016/j.patcog.2025.112439","url":null,"abstract":"<div><div>Recent research typically improves the performance of multi-label image recognition by constructing higher-order pairwise label correlations. However, these methods lack the ability to effectively learn multi-scale features, which makes it difficult to distinguish small-scale objects. Moreover, most current attention-based methods to capture local salient features may ignore many useful non-salient features. To address the aforementioned issues, we propose a Transformer-based Progressive Attention Network (TPANet) for multi-label image recognition. Specifically, we first design a new adaptive multi-scale feature attention (AMSA) module to learn cross-scale features in multi-level features. Then, to excavate various useful object features, we introduce the transformer encoder to construct a semantic spatial attention (ESA) module and also propose a context-aware feature enhanced (CAFE) module. The former ESA module is used to discover complete object regions and capture discriminative features, and the latter CAFE module leverages object-local features to enhance pixel-level global features. The proposed TPANet model can generate more accurate object labels in three popular benchmark datasets (i.e., MS-COCO 2014, Pascal VOC 2007 and Visual Genome), and is competitive to state-of-the-art models (e.g., SST and FL-Tran, etc.).</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112439"},"PeriodicalIF":7.6,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}