Weihu Song , Lei Li , Mengxiao Zhu , Yue Pei , Haogang Zhu
{"title":"MMP: Enhancing unsupervised graph anomaly detection with multi-view message passing","authors":"Weihu Song , Lei Li , Mengxiao Zhu , Yue Pei , Haogang Zhu","doi":"10.1016/j.patcog.2025.112388","DOIUrl":"10.1016/j.patcog.2025.112388","url":null,"abstract":"<div><div>The complementary and conflicting relationships between views are two fundamental issues when applying Graph Neural Networks (GNNs) to multi-view attributed graph anomaly detection. Most existing approaches do not address the inherent multi-view properties in the attribute space or leverage complementary information through simple representation fusion, which overlooks the conflicting information among different views. In this paper, we argue that effectively applying GNNs to multi-view anomaly detection necessitates reinforcing complementary information between views and, more importantly, managing conflicting information. Building on this perspective, this paper introduces Multi-View Message Passing (MMP), a novel and effective message passing paradigm specifically designed for multi-view anomaly detection. In the multi-view aggregation phase of MMP, views containing different types of information are integrated using view-specific aggregation functions. This approach enables the model to dynamically adjust the amount of information aggregated from complementary and conflicting views, thereby mitigating issues arising from insufficient complementary information and excessive conflicting information, which can lead to suboptimal representation learning. Furthermore, we propose an innovative aggregation loss mechanism that enhances model performance by optimizing the reconstruction differences between aggregated representations and the original views, thereby improving both detection accuracy and model interpretability. Extensive experiments on synthetic and real-world datasets validate the effectiveness and robustness of our method. The source code is available at <span><span>https://github.com/weihus/MMP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112388"},"PeriodicalIF":7.6,"publicationDate":"2025-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145046649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning from majority label: A novel problem in multi-class multiple-instance learning","authors":"Kaito Shiku, Shinnosuke Matsuo, Daiki Suehiro, Ryoma Bise","doi":"10.1016/j.patcog.2025.112425","DOIUrl":"10.1016/j.patcog.2025.112425","url":null,"abstract":"<div><div>The paper proposes a novel multi-class Multiple-Instance Learning (MIL) problem called Learning from Majority Label (LML). In LML, the majority class of instances in a bag is assigned as the bag-level label. The goal of LML is to train a classification model that estimates the class of each instance using the majority label. This problem is valuable in a variety of applications, including pathology image segmentation, political voting prediction, customer sentiment analysis, and environmental monitoring. To solve LML, we propose a Counting Network trained to produce bag-level majority labels, estimated by counting the number of instances in each class. Furthermore, analysis experiments on the characteristics of LML revealed that bags with a high proportion of the majority class facilitate learning. Based on this result, we developed a Majority Proportion Enhancement Module (MPEM) that increases the proportion of the majority class by removing minority class instances within the bags. Experiments demonstrate the superiority of the proposed method on four datasets compared to conventional MIL methods. Moreover, ablation studies confirmed the effectiveness of each module. The code is available at <span><span>here</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112425"},"PeriodicalIF":7.6,"publicationDate":"2025-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145046647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature subset weighting for distance-based supervised learning","authors":"Adnan Theerens , Yvan Saeys , Chris Cornelis","doi":"10.1016/j.patcog.2025.112424","DOIUrl":"10.1016/j.patcog.2025.112424","url":null,"abstract":"<div><div>This paper introduces feature subset weighting using monotone measures for distance-based supervised learning. The Choquet integral is used to define a distance function that incorporates these weights. This integration enables the proposed distances to effectively capture non-linear relationships and account for interactions both between conditional and decision attributes and among conditional attributes themselves, resulting in a more flexible distance measure. In particular, we show how this approach ensures that the distances remain unaffected by the addition of duplicate and strongly correlated features. Another key point of this approach is that it makes feature subset weighting computationally feasible, since only <span><math><mi>m</mi></math></span> feature subset weights should be calculated each time instead of calculating all feature subset weights (<span><math><msup><mn>2</mn><mi>m</mi></msup></math></span>), where <span><math><mi>m</mi></math></span> is the number of attributes. Next, we also examine how the use of the Choquet integral for measuring similarity leads to a non-equivalent definition of distance. The relationship between distance and similarity is further explored through dual measures. Additionally, symmetric Choquet distances and similarities are proposed, preserving the classical symmetry between similarity and distance. Finally, we introduce a concrete feature subset weighting distance, evaluate its performance in a <span><math><mi>k</mi></math></span>-nearest neighbours (KNN) classification setting, and compare it against Mahalanobis distances and weighted distance methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112424"},"PeriodicalIF":7.6,"publicationDate":"2025-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145046646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai Ma , Tianyu Du , Qi Zhu , Xuyun Wen , Jiashuang Huang , Xibei Yang , Daoqiang Zhang
{"title":"Hyper-network curvature: A new representation method for high-order brain network analysis","authors":"Kai Ma , Tianyu Du , Qi Zhu , Xuyun Wen , Jiashuang Huang , Xibei Yang , Daoqiang Zhang","doi":"10.1016/j.patcog.2025.112397","DOIUrl":"10.1016/j.patcog.2025.112397","url":null,"abstract":"<div><div>Human brain is a complex system and contains abundant high-order interactions among multiple brain regions, which can be described by brain hyper-network. In brain hyper-networks, nodes represent brain regions of interest (ROIs), while edges describe the interactions of multiple ROIs, providing important high-order information for brain disease analysis and diagnosis. However, most of the existing hyper-network studies focused on the hyper-connection (i.e. hyper-edge) analysis and ignored the local topological information on nodes. To address this problem, we propose a new representation method (i.e., hyper-network curvature) for brain hyper-network analysis. Compared with the existing hyper-network representation methods, the proposed hyper-network curvature can be used to analyze the local topologies of nodes in brain hyper-networks. Based on hyper-network curvature, we further propose a novel graph kernel called brain hyper-network curvature kernel to measure the similarity of a pair of brain hyper-networks. We have proved that the proposed hyper-network curvature is bounded and brain hyper-network curvature kernel is positive definite. To evaluate the effectiveness of our proposed method, we perform the classification experiments on functional magnetic resonance imaging data of brain diseases. The experimental results demonstrate that our proposed method can significantly improve classification accuracy compared to the state-of-the-art graph kernels and graph neural networks for classifying brain diseases.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112397"},"PeriodicalIF":7.6,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145046674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diqi Chen , Yang Li , Jiajun Liu , Jun Zhou , Yongsheng Gao
{"title":"SATE: Efficient knowledge distillation with implicit student-aware teacher ensembles","authors":"Diqi Chen , Yang Li , Jiajun Liu , Jun Zhou , Yongsheng Gao","doi":"10.1016/j.patcog.2025.112355","DOIUrl":"10.1016/j.patcog.2025.112355","url":null,"abstract":"<div><div>Recent findings suggest that with the same teacher architecture, a fully converged or “stronger” checkpoint surprisingly leads to a worse student. This can be explained by the Information Bottleneck (IB) principle, as the features of a weaker teacher transfer more “dark” knowledge because they maintain higher mutual information with the inputs. Meanwhile, various works have shown that severe teacher-student structural disparity or capability mismatch often leads to worse student performance. To deal with these issues, we propose a generalizable and efficient Knowledge Distillation (KD) framework with implicit Student-Aware Teacher Ensembles (SATE). The SATE framework simultaneously trains a student network and a student-aware intermediate teacher as a learning companion. With the proposed co-training strategy, the intermediate teacher is trained gradually and forms implicit ensembles of weaker teachers along the learning process. Such a design enables the student model to retain more dark knowledge for better generalization ability. The proposed framework improves the training scheme in a plug-and-play way so that it can be applied to improve various classic and state-of-the-art KD methods on both intra-domain (up to <span><math><mrow><mn>2.184</mn><mspace></mspace><mo>%</mo></mrow></math></span>) and cross-domain (up to <span><math><mrow><mn>7.358</mn><mspace></mspace><mo>%</mo></mrow></math></span>) settings, under a diversified configurations on teacher-student architectures, and achieves a major efficient advantage over other generic frameworks. The code is available at <span><span>https://github.com/diqichen91/SATE.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112355"},"PeriodicalIF":7.6,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MonoA2: Adaptive depth with augmented head for monocular 3D object detection","authors":"Jinpeng Dong , Sanping Zhou , Yufeng Hu , Yuhao Huang , Jingjing Jiang , Weiliang Zuo , Shitao Chen , Nanning Zheng","doi":"10.1016/j.patcog.2025.112418","DOIUrl":"10.1016/j.patcog.2025.112418","url":null,"abstract":"<div><div>Monocular 3D object detection is a hot direction due to its low cost and configuration simplicity. Achieving accurate instance depth prediction from monocular images is a challenging problem in monocular 3D object detection. Many existing methods perform instance depth prediction based on fixed rules, which are not flexible for various objects. Furthermore, these methods ignore the design of more discriminative task heads. To address these issues, we propose the MonoA<span><math><msup><mrow></mrow><mn>2</mn></msup></math></span>, which consists of the Adaptive Depth Module (ADM) and the Augmented Head Module (AHM). The ADM is used to achieve more accurate depth prediction by learning adaptive offsets to decouple the depth prediction from object center constraints. The AHM is proposed to obtain more discriminative task heads through task-aware attention and task-interaction attention. The task-aware attention can generate different weights adapted to different tasks and the task-interaction attention can guide depth tasks to interact with other tasks. Experimental results on the KITTI and Waymo datasets demonstrate the effectiveness of the proposed method. Our method achieves superior performance on the KITTI and Waymo benchmarks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112418"},"PeriodicalIF":7.6,"publicationDate":"2025-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yun Liu , Zhipeng Wen , Leida Li , Peiguang Jing , Daoxin Fan
{"title":"MIGF-Net: Multimodal interaction-guided fusion network for image aesthetics assessment","authors":"Yun Liu , Zhipeng Wen , Leida Li , Peiguang Jing , Daoxin Fan","doi":"10.1016/j.patcog.2025.112401","DOIUrl":"10.1016/j.patcog.2025.112401","url":null,"abstract":"<div><div>With the development of social media, people like to post images and comments to share their ideas, which provides rich visual and textural semantic information for image aesthetics assessment (IAA). However, most previous works either extracted the unimodal aesthetic features from image due to the difficulty of obtaining comments, or combined multimodal information together but ignoring the interactive relationship between image and comment, which limits the overall performance. To solve the above problem, we propose a Multimodal Interaction-Guided Fusion Network (MIGF-Net) for image aesthetics assessment based on both image and comment semantic information, which can not only solve the challenge of comment generating, but also provide the multimodal feature interactive information. Specifically, considering the coupling mechanism of the image theme, we construct a visual semantic fusion module to extract the visual semantic feature based on the visual attributes and the theme features. Then, a textural semantic feature extractor is designed to mine the semantic information hidden in comments, which not only addresses the issue of missing comments but also effectively complements the visual semantic features. Furthermore, we establish a Dual-Stream Interaction-Guided Fusion module to fuse the semantic features of images and comments, fully exploring the interactive relationship between images and comments in the human brain’s perception mechanism. Experimental results on two public image aesthetics evaluation datasets demonstrate that our model outperforms the current state-of-the-art methods. Our code will be released at <span><span>https://github.com/wenzhipeng123/MIGF-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112401"},"PeriodicalIF":7.6,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145026279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aegis: A domain generalization framework for medical image segmentation by mitigating feature misalignment","authors":"Yuheng Xu , Taiping Zhang , Yuqi Fang","doi":"10.1016/j.patcog.2025.112406","DOIUrl":"10.1016/j.patcog.2025.112406","url":null,"abstract":"<div><div>Domain shift caused by variations in data acquisition significantly impedes the deployment of medical image segmentation models in clinical settings. Domain generalization aims to mitigate performance degradation induced by domain shift by training a model using source domain data and generalize well to unseen target domain. In this work, we have an interesting observation: domain shift results in significantly different activation patterns across domains even they have semantically identical input. This cross-domain “feature misalignment” phenomenon motivates us to develop a hypothesis: mitigating cross-domain feature misalignment may enhance domain generalization. To this end, we propose a framework called <strong>Aegis</strong>, which employs style augmentation to generate augmented image features that simulate domain shift. Subsequently, we introduce a dual attention-guided feature calibration (DAFC) module to facilitate feature interaction between source and augmented images, thereby establishing an implicit alignment constraint within the shared feature space. Furthermore, we propose an uncertainty-guided feature alignment (UFA) loss, which quantifies segmentation discrepancies caused by domain shift and incorporates an uncertainty-weighting mechanism to enhance the alignment of hard-to-classify pixel regions. These components work in synergy to effectively mitigate cross-domain feature misalignment, promote robust feature alignment, and ultimately improve cross-domain generalization. Extensive experiments conducted on three widely used benchmarks demonstrate that the proposed framework significantly outperforms existing methods in domain generalization. Code is available at <span><span>https://github.com/Zerua-bit/Aegis</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112406"},"PeriodicalIF":7.6,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guangbao Zhou , Pengliang Liu , Quanle Lin , Miao Qian , Zhong Xiang , Zeyu Zheng , Lixian Liu
{"title":"Time series adaptive mode decomposition (TAMD): Method for improving forecasting accuracy in the apparel industry","authors":"Guangbao Zhou , Pengliang Liu , Quanle Lin , Miao Qian , Zhong Xiang , Zeyu Zheng , Lixian Liu","doi":"10.1016/j.patcog.2025.112417","DOIUrl":"10.1016/j.patcog.2025.112417","url":null,"abstract":"<div><div>Accurate forecasting of apparel sales is critical for inventory management, supply chain optimization, and market strategy planning. However, existing forecasting models often struggle to effectively capture the complex characteristics of apparel sales data, such as distinct seasonality, cyclicality, and strongly nonlinear fluctuations, which significantly hinder prediction accuracy and generalization ability. To address these challenges, this study introduces a novel Time series Adaptive Mode Decomposition (TAMD)-based forecasting algorithm. The proposed method: (1) employs Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and sample entropy-guided Variational Mode Decomposition (VMD) to separate the input time series into noise components and multiple smooth Intrinsic Mode Functions (IMFs), to better capture intrinsic data dynamics; (2) refines the sub-series distribution features via an adaptive module guided by sample entropy, dividing each sub-series into subsequences with maximal distribution difference to improve adaptability to periodic changes and market volatility; (3) predicts each subsequence with adaptive distribution matching based on discontinuous random subsequence combinations, and then linearly superposes the prediction results as a final output, thereby boosting accuracy and generalizability. Comprehensive experiments on both public and self-constructed datasets (including four years of Taobao sales data for dresses, jeans, sweatshirts, and sweaters, totaling over 44.7 million records) demonstrate that TAMD outperforms existing methods significantly, highlighting its effectiveness in revealing the complexity of apparel market data and enhancing prediction performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112417"},"PeriodicalIF":7.6,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145046648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junyin Wang , Chenghu Du , Tongao Ge , Bingyi Liu , Shengwu Xiong
{"title":"D3PD: Dual distillation and dynamic fusion for camera-radar 3D perception","authors":"Junyin Wang , Chenghu Du , Tongao Ge , Bingyi Liu , Shengwu Xiong","doi":"10.1016/j.patcog.2025.112350","DOIUrl":"10.1016/j.patcog.2025.112350","url":null,"abstract":"<div><div>Autonomous driving perception is driving rapid advancements in Bird’s-Eye-View (BEV) technology. The synergy of surround-view imagery and radar is seen as a cost-friendly approach that enhances the understanding of driving scenarios. However, current methods for fusing radar and camera features lack effective environmental perception guidance and dynamic adjustment capabilities, which restricts their performance in real-world scenarios. In this paper, we introduce the D3PD framework, which combines fusion techniques with knowledge distillation to tackle the dynamic guidance deficit in existing radar-camera fusion methods. Our method includes two key modules: Radar-Camera Feature Enhancement (RCFE) and Dual Distillation Knowledge Transfer. The RCFE module enhances the areas of interest in BEV, addressing the poor object perception performance of single-modal features. The Dual Distillation Knowledge Transfer includes four distinct modules: Camera Radar Sparse Distillation (CRSD) for sparse feature knowledge transfer and teacher-student network feature alignment. Position-guided Sampling Distillation(SamD) for refining the knowledge transfer of fused features through dynamic sampling. Detection Constraint Result Distillation (DcRD) for strengthening the positional correlation between teacher and student network outputs in forward propagation, achieving more precise detection perception. and Self-learning Mask Focused Distillation (SMFD) for focusing perception detection results on knowledge transfer through self-learning, concentrating on the reinforcement of local key areas. The D3PD framework outperforms existing methods on the nuScenes benchmark, achieving 49.6 % mAP and 59.2 % NDS performance. Moreover, in the occupancy prediction task, D3PD-Occ has achieved an advanced performance of 37.94 % mIoU. This provides insights for the design and model training of camera and radar-based 3D object detection and occupancy network prediction methods. The code will be available at <span><span>https://github.com/no-Name128/D3PD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112350"},"PeriodicalIF":7.6,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}