Gang Xiao , Sihan Ge , Yangsheng Zhong , Zhongcheng Xiao , Junfeng Song , Jiawei Lu
{"title":"SAPFormer: Shape-aware propagation Transformer for point clouds","authors":"Gang Xiao , Sihan Ge , Yangsheng Zhong , Zhongcheng Xiao , Junfeng Song , Jiawei Lu","doi":"10.1016/j.patcog.2025.111578","DOIUrl":"10.1016/j.patcog.2025.111578","url":null,"abstract":"<div><div>Transformer-based networks have achieved impressive performance on three-dimensional point cloud data. However, most existing methods focus on aggregating local features in the neighborhoods of a point cloud, ignoring the global feature information. Therefore, it is difficult to capture the long-range dependencies of a point cloud. In this paper, we propose the <strong>Shape-Aware Propagation Transformer (SAPFormer)</strong>, which flexibly captures the semantic information of point clouds in geometric space and effectively extracts the contextual geometric space information. Specifically, we first design local group self-attention (LGA) to capture the local interaction information in each region. To capture the separated local region feature relationships, we propose local group propagation (LGP) to pass the information between different regions via query points. This allows features to propagate among neighbors for more fine-grained feature information. To further enlarge the receptive field, we propose the global shape feature module (GSFM) to learn global context information through key shape points (KSPs). Finally, to solve the positional information cues between global contexts, we introduce spatial-shape relative position encoding (SS-RPE), which obtains positional relationships between points. Extensive experiments demonstrate the effectiveness and superiority of our method on the S3DIS, SensatUrban, ScanNet V2, ShapeNetPart, and ModelNet40 datasets. The code is available at <span><span>https://github.com/viivan/SAPFormer-main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111578"},"PeriodicalIF":7.5,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MPE: Multi-frame prediction error-based video anomaly detection framework for robust anomaly inference","authors":"Yujun Kim, Young-Gab Kim","doi":"10.1016/j.patcog.2025.111595","DOIUrl":"10.1016/j.patcog.2025.111595","url":null,"abstract":"<div><div>As video surveillance has become increasingly widespread, the necessity of video anomaly detection to support surveillance-related tasks has grown significantly. We propose a novel multi-frame prediction error-based framework (MPE) to enhance anomaly detection accuracy and efficiency. MPE mitigates false positives in prediction models by leveraging multi-frame prediction errors and reduces the time required for their generation through a frame prediction error storage method. The core idea of MPE is to reduce the prediction error of a normal frame while increasing the prediction error of an abnormal frame by leveraging the prediction errors of adjacent frames. We evaluated our method on the Ped2, Avenue, and ShanghaiTech datasets. The experimental results demonstrate that MPE improved the frame-level area under the curve (AUC) of prediction models while maintaining low computational overhead across all datasets. These results show that MPE makes prediction models robust and efficient for video anomaly detection in real-world scenarios.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111595"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143654600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei He , Shangzhi Zhang , Chun-Guang Li , Xianbiao Qi , Rong Xiao , Jun Guo
{"title":"Neural Normalized Cut: A differential and generalizable approach for spectral clustering","authors":"Wei He , Shangzhi Zhang , Chun-Guang Li , Xianbiao Qi , Rong Xiao , Jun Guo","doi":"10.1016/j.patcog.2025.111545","DOIUrl":"10.1016/j.patcog.2025.111545","url":null,"abstract":"<div><div>Spectral clustering, as a popular tool for data clustering, requires an eigen-decomposition step on a given affinity to obtain the spectral embedding. Nevertheless, such a step suffers from the lack of generalizability and scalability. Moreover, the obtained spectral embeddings can hardly provide a good approximation to the ground-truth partition and thus a <span><math><mi>k</mi></math></span>-means step is adopted to quantize the embedding. In this paper, we propose a simple yet effective scalable and generalizable approach, called Neural Normalized Cut (NeuNcut), to learn the clustering membership for spectral clustering directly. In NeuNcut, we properly reparameterize the unknown cluster membership via a neural network, and train the neural network via stochastic gradient descent with a properly relaxed normalized cut loss. As a result, our NeuNcut enjoys a desired generalization ability to directly infer clustering membership for out-of-sample unseen data and hence brings us an efficient way to handle clustering task with ultra large-scale data. We conduct extensive experiments on both synthetic data and benchmark datasets and experimental results validate the effectiveness and the superiority of our approach.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111545"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Huang , Michelangelo Valsecchi , Michael Multerer
{"title":"Anisotropic multiresolution analyses for deepfake detection","authors":"Wei Huang , Michelangelo Valsecchi , Michael Multerer","doi":"10.1016/j.patcog.2025.111551","DOIUrl":"10.1016/j.patcog.2025.111551","url":null,"abstract":"<div><div>Generative Adversarial Networks (GANs) can be misused to fabricate elaborate lies. The threat posed by GANs has sparked the need to discern between genuine and fabricated content. We argue that since GANs primarily utilize isotropic convolutions to generate their output, they leave clear traces, their fingerprint, in the coefficient distribution on sub-bands extracted by anisotropic multiresolution transforms. We employ the fully separable wavelet transform and anisotropic multiwavelets to obtain anisotropic features to feed to lightweight convolutional neural network classifiers. The proposed approach is capable of considerably improving the state-of-the-art in detecting fully GAN-generated images. It is particularly resilient to common perturbations, such as compression, noise or blur. We find that anisotropic transforms, when combined with XceptionNet, also significantly enhance the state-of-the-art in detecting partially manipulated images.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111551"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143682025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muxin Liao , Wei Li , Chengle Yin , Yuling Jin , Yingqiong Peng
{"title":"Concept-guided domain generalization for semantic segmentation","authors":"Muxin Liao , Wei Li , Chengle Yin , Yuling Jin , Yingqiong Peng","doi":"10.1016/j.patcog.2025.111550","DOIUrl":"10.1016/j.patcog.2025.111550","url":null,"abstract":"<div><div>Recent domain generalization semantic segmentation methods are proposed to use vision foundation models (VFMs) for achieving superior performance in unseen domains. However, unlike human vision, which naturally adapts to recognize objects in different contexts, VFMs still suffer from the distribution shift problem. Based on this, a concept-guided domain generalization (CDG) approach is proposed for semantic segmentation. First, considering that humans can recognize objects in various environments after humans learn the conception of objects, a concept token learning module is proposed to learn the semantic concept token from semantic prototypes, which aims to exploit domain-invariant instance-aware knowledge. Second, when the recognition of objects is uncertain, humans recognize the objects by contextual information. Thus, a concept-contextual calibration strategy is proposed to generate concept-contextual relations by the semantic concepts to calibrate uncertain regions for refining final predictions. Extensive experiments demonstrate that the proposed approach achieves superior performance on multiple benchmarks. The code is released on GitHub: <span><span>https://github.com/seabearlmx/CDG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111550"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143620243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuequn Wang, Jie Liu, Jianli Wang, Leqiang Yang, Bo Dong, Zhengwei Li
{"title":"HaarFuse: A dual-branch infrared and visible light image fusion network based on Haar wavelet transform","authors":"Yuequn Wang, Jie Liu, Jianli Wang, Leqiang Yang, Bo Dong, Zhengwei Li","doi":"10.1016/j.patcog.2025.111594","DOIUrl":"10.1016/j.patcog.2025.111594","url":null,"abstract":"<div><div>Infrared-visible image fusion remains challenging due to the inherent conflict between preserving multi-modal complementary features and minimizing reconstruction loss. Existing methods often suffer from inadequate feature representation and information degradation during fusion. To address this, we propose HaarFuse, a wavelet-enhanced auto-encoder network that hierarchically integrates multi-scale features for robust fusion. The network first employs wavelet transform to extend the receptive field of convolutional layers, extracting shared shallow features that encode both low-frequency structural contours and high-frequency texture primitives. Subsequently, the shallow features are decomposed into high-frequency and low-frequency components through Haar wavelet transform, and techniques such as INN, Gabor layer, and Transformer are adopted to further optimize and process these features. Finally, the fused image is reconstructed via the inverse wavelet transform. Experiments on TNO, MSRS, and M3FD benchmarks validate HaarFuse's superiority: it achieves the highest thermal saliency (SD=45.78, +5.5%↑ on MSRS; EN=6.98, +4.0%↑ on M3FD), optimal edge fidelity (Qabf=0.62, +1.6%↑ on M3FD), and 34.2 × faster inference than SwinFusion with 0.468 MB parameters. Further validation in machine vision and medical imaging confirms its robustness for real-time applications.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111594"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143642410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhen Li , Zhongyuan Liu , Dongliang Chang , Aneeshan Sain , Xiaoxu Li , Zhanyu Ma , Jing-Hao Xue , Yi-Zhe Song
{"title":"Self-randomized focuses effectively boost metric-based few-shot classifiers","authors":"Zhen Li , Zhongyuan Liu , Dongliang Chang , Aneeshan Sain , Xiaoxu Li , Zhanyu Ma , Jing-Hao Xue , Yi-Zhe Song","doi":"10.1016/j.patcog.2025.111538","DOIUrl":"10.1016/j.patcog.2025.111538","url":null,"abstract":"<div><div>Towards solving a few-shot image classification task, deep metric learning is the de-facto approach. Usually the idea here is to train a deep metric model on base data, and evaluate it using novel data without any fine-tuning. Enhancing model performance is mostly focused upon improving feature or class representations, or designing or learning new metrics, often ignoring deep exploration of data-augmentation techniques to enhance few-shot learning. Interestingly, we discover that augmentation strategies, such as Cutout, Mixup and CutMix, would in fact greatly enhance performance of few-shot models. We conjecture, this is because such augmentation techniques encourage the model to extend its focus on multiple discriminative regions of an object instead of restricting to just the single-most discriminative point. Following this important discovery, we propose two simple yet effective novel data augmentation methods, viz. CutRot and CutCov, specifically designed to self-randomize focuses within an image itself for metric-based few-shot image classification. While CutRot involves random rotation of any patch within the image, CutCov focuses on random swapping of patches, again within the image. Extensive experiments verify that CutRot or CutCov can significantly boost performances of both classic and recent popular metric-based methods and performs much better than other augmentation methods of Cutout, Mixup, and CutMix on four few-shot image classification datasets. Code is available at <span><span>https://github.com/liz-lut/CutRot-and-CutCov-main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111538"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SyntaPulse: An unsupervised framework for sentiment annotation and semantic topic extraction","authors":"Hadis Bashiri, Hassan Naderi","doi":"10.1016/j.patcog.2025.111593","DOIUrl":"10.1016/j.patcog.2025.111593","url":null,"abstract":"<div><div>Sentiment analysis is a critical area within natural language processing, with applications in various domains like marketing, social media analytics, and politics. However, current methods encounter challenges in handling contextual ambiguities, accurately detecting sarcasm and irony, and effectively processing domain-specific vocabulary without extensive labeled datasets. Addressing these issues is essential, as the nuanced nature of language can lead to diverse interpretations across contexts, complicating reliable sentiment analysis. Furthermore, sarcasm and irony remain difficult to identify precisely, while reliance on labeled data and limitations in handling domain-specific vocabulary restrict adaptability across different fields. This paper presents SyntaPulse, a novel framework for sentiment classification in social networks, developed to overcome these challenges. The framework combines an innovative dictionary-based approach with Probabilistic Syntactic Latent Semantic Analysis (PSLSA) for semantic topic extraction. This integration enables it to handle homographs effectively, thereby enhancing sarcasm detection, facilitating the interpretation of domain-specific vocabulary, and reducing dependency on labeled data. Evaluated on 12 datasets, our framework demonstrates adaptability across various domains and achieves high Macro-F1 scores, ranging from 72.89 % to 96.22 %. SyntaPulse has also obtained improvements on seven datasets, with the lowest improvement rate being 0.21 % and the highest reaching 2.97 %.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111593"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143642406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Dang , Gang Liu , Chao Chen , Di Wang , Xike Li , Quan Wang
{"title":"Adaptive spatial and scale label assignment for anchor-free object detection","authors":"Min Dang , Gang Liu , Chao Chen , Di Wang , Xike Li , Quan Wang","doi":"10.1016/j.patcog.2025.111549","DOIUrl":"10.1016/j.patcog.2025.111549","url":null,"abstract":"<div><div>In recent years, anchor-free object detection has attracted widespread attention due to its simplicity and efficiency. The mainstream anchor-free object detectors allocate positive/negative candidate samples through prior guidance at a fixed spatial position and assign positive/negative samples according to predefined scale constraints. However, artificially designing assignment strategies according to prior data distribution may hinder further optimization of label assignment. To this end, this paper proposes Adaptive Spatial and Scale Label Assignment (ASS-LA) to improve the performance of anchor-free object detection. Positive/negative samples are distributed from different pyramid levels using spatial and scale constraints. Specifically, an adaptive Intersection-over-Union (IoU) space assignment is designed to select candidate positive sample points. The membership degree is introduced at each pyramid level to adaptively fuzzy the scale assignment range so that the detector selects the final positive sample from the candidate sample points. Furthermore, a reference box is introduced to design the predicted IoU branch of coupled regression. In the inference stage, the predicted IoU and classification scores are combined as the confidence of the regression bounding box to alleviate the inconsistency between classification and regression. Extensive experiments show that our method achieves comparable performance to other existing label assignment schemes. With the introduction of ASS-LA, the anchor-free object detector has significant performance improvements without introducing other overhead.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111549"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143610878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuting Pang , Yidi Chen , Xiaoshuang Shi , Rui Wang , Mingzhe Dai , Xiaofeng Zhu , Bin Song , Kang Li
{"title":"Interpretable 2.5D network by hierarchical attention and consistency learning for 3D MRI classification","authors":"Shuting Pang , Yidi Chen , Xiaoshuang Shi , Rui Wang , Mingzhe Dai , Xiaofeng Zhu , Bin Song , Kang Li","doi":"10.1016/j.patcog.2025.111539","DOIUrl":"10.1016/j.patcog.2025.111539","url":null,"abstract":"<div><div>Deep learning methods have been widely applied in diagnostic research on MRI data. Among the existing methods, attention-based multiple-instance learning, which not only provides classification results but also explains significant regions related to the task, has attracted considerable attention from scholars. However, prior methods might be restricted by these issues: (i) the loss of spatial or volume information, (ii) semantic inconsistency of attention weights, (iii) missing information exchange between attention mechanisms within different branches. To overcome these issues, we propose an innovative dual-branch attention-based deep multiple-instance learning framework, namely HA-CSL, which consists of a 2D branch and a 3D branch, a hierarchical attention (HA) module and a consistency learning (CSL) module. Specifically, the 2D and 3D branches employ the 2D and 3D convolutional neural networks to extract 2D and 3D patch-level features, respectively, so as to learn more richer image information. Additionally, the HA module comprises slice-, region- and channel-level attentions to interpret the significance of slices, regions and channels, respectively. Moreover, the CSL module is to enhance the consistency of attention weights obtained by the two branches, so as to reduce the semantic gap of attentions and promote better information exchange of two branches. Experiments on two 3D MRI image datasets demonstrate the superior classification and interpretation performance of the proposed framework over recent state-of-the-art methods. <em>The source codes are available at</em><span><span><em>https://github.com/shuting-pang/HA_CSL</em></span><svg><path></path></svg></span><em>.</em></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111539"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143642407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}