{"title":"Attribute alignment networks for generalized zero-shot learning","authors":"Nannan Lu, Mingkai Qiu, Jiansheng Qian","doi":"10.1016/j.patrec.2025.09.010","DOIUrl":"10.1016/j.patrec.2025.09.010","url":null,"abstract":"<div><div>Part-based embedding methods with attention mechanism achieved outstanding results in zero-shot learning (ZSL). However, affected by intra-class variations in the datasets (i.e., different samples of the same class present different attribute characteristics), it is difficult for models based on traditional attention mechanisms to achieve accurate visual-attribute alignment. To tackle this problem, we propose a novel approach to fully utilize attributes information, referred to as attribute alignment networks (AAN). It consists of the attribute alignment (AA) pipeline and the attribute enhancement (AE) module. AA pipeline is a brand-new solution for part-based embedding method, which realizes visual-attribute alignment in both attribute space and attribute semantic space under the supervision of class attribute vectors and attribute word vectors, respectively. AE module employs the Graph Neural Networks (GNNs) to project visual features to the attribute semantic space. Based on the constructed attribute relation graph (ARG) and self-attention mechanism, AE module generates the enhanced representation of attributes to minimize the influence of intra-class variations. Experiments on standard datasets demonstrate that the enhanced attribute representation greatly improves the classification performance. Overall, AAN outperforms the other state-of-the-art performances in ZSL and GZSL tasks.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 50-56"},"PeriodicalIF":3.3,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RoNFA: Robust neural field-based approach for few-shot image classification with noisy labels","authors":"Nan Xiang , Lifeng Xing , Dequan Jin","doi":"10.1016/j.patrec.2025.09.009","DOIUrl":"10.1016/j.patrec.2025.09.009","url":null,"abstract":"<div><div>In few-shot learning (FSL), the labeled samples are scarce. Thus, label errors can significantly reduce classification accuracy. Since label errors are inevitable in realistic learning tasks, improving the robustness of the model in the presence of label errors is critical. This paper proposes a new robust neural field-based image approach (RoNFA) for few-shot image classification with noisy labels. RoNFA consists of two neural fields for feature and category representation. They correspond to the feature space and category set. Each neuron in the field for category representation (FCR) has a receptive field (RF) on the field for feature representation (FFR) centered at the representative neuron for its category generated by soft clustering. In the prediction stage, the range of these receptive fields adapts according to the neuronal activation in FCR to ensure prediction accuracy. These learning strategies provide the proposed model with excellent few-shot learning capability and strong robustness against label noises. The experimental results on real-world FSL datasets with three different types of label noise demonstrate that the proposed method significantly outperforms state-of-the-art FSL methods. Its accuracy obtained in the presence of noisy labels even surpasses the results obtained by state-of-the-art FSL methods trained on clean support sets, indicating its strong robustness against noisy labels.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 36-42"},"PeriodicalIF":3.3,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vítor M. Hanriot , Turíbio T. Salis , Luiz C.B. Torres , Frederico Coelho , Antonio P. Braga
{"title":"Large margin classifier with graph-based adaptive regularization","authors":"Vítor M. Hanriot , Turíbio T. Salis , Luiz C.B. Torres , Frederico Coelho , Antonio P. Braga","doi":"10.1016/j.patrec.2025.09.008","DOIUrl":"10.1016/j.patrec.2025.09.008","url":null,"abstract":"<div><div>This paper introduces the use of per-class regularization hyperparameters in Gabriel graph-based binary classifiers. We demonstrate how the quality index used for regularization behaves both in the margin region and in the presence of outliers, and how incorporating this regularization flexibility can lead to solutions that effectively eliminate outliers while training the classifier. We also show how it can address class imbalance by generating higher and lower thresholds for the majority and minority classes, respectively. Thus, rather than having a single solution based on fixed thresholds, flexible thresholds expand the solution space and can be optimized through hyperparameter tuning algorithms. Friedman test shows that flexible thresholds are capable of improving Gabriel graph-based classifiers.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 43-49"},"PeriodicalIF":3.3,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-period interaction networks for time series forecasting","authors":"Yuqing Xie, Lujuan Dang, Badong Chen","doi":"10.1016/j.patrec.2025.09.007","DOIUrl":"10.1016/j.patrec.2025.09.007","url":null,"abstract":"<div><div>Forecasting time series with complex and overlapping periodic patterns remains a major challenge, especially in long-term prediction tasks where both local and cross-period dependencies must be modeled. In this work, we propose Multi-Period Interaction Networks, a fully multilayer perceptron based architecture designed to capture temporal dynamics across multiple periodic components. The core of the model is a Period-Frequency Interaction module, which enables dynamic modeling of multi-periodic structures in time series data. We evaluate MPINet on the task of lithium-ion battery state of health estimation, where accurate long-term prediction is essential for ensuring system reliability and safety. Extensive experiments on real-world battery datasets demonstrate that MPINet achieves state-of-the-art forecasting accuracy while maintaining high computational efficiency, highlighting its effectiveness for both battery health monitoring and broader time series forecasting applications.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 29-35"},"PeriodicalIF":3.3,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145159658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust watermarks for audio diffusion models by quadrature amplitude modulation","authors":"Kyungryeol Lee , Seongmin Hong , Se Young Chun","doi":"10.1016/j.patrec.2025.08.023","DOIUrl":"10.1016/j.patrec.2025.08.023","url":null,"abstract":"<div><div>Generative models enable the creation of high-quality digital content, including images, videos, and audio, making these tools increasingly accessible to users. As their use grows, so does the need for robust copyright protection mechanisms. Existing watermarking methods, primarily post-hoc, can safeguard the copyrights of users but fail to protect service providers, leaving room for intentional misuse, such as erasing watermarks and falsely claiming originality. To address this, previous works proposed integrating watermarks into the noise initialization of diffusion models for image generation, ensuring robustness against attacks like cut-and-paste. However, this approach has not been investigated for audio generation, where the 1D nature of audio data requires fundamentally different techniques. In this paper, we propose a novel barcode-like watermarking method for audio diffusion models, leveraging 4-quadrature amplitude modulation (4-QAM) to embed twice as much information as amplitude modulation methods for existing image generations. Our approach demonstrates significantly improved robustness against attacks, including cut-and-paste, and outperforms state-of-the-art audio watermarking techniques in preserving information and ensuring copyright protection for both users and service providers.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 22-28"},"PeriodicalIF":3.3,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145121301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From open-vocabulary to vocabulary-free semantic segmentation","authors":"Klara Reichard , Giulia Rizzoli , Stefano Gasperini , Lukas Hoyer , Pietro Zanuttigh , Nassir Navab , Federico Tombari","doi":"10.1016/j.patrec.2025.08.025","DOIUrl":"10.1016/j.patrec.2025.08.025","url":null,"abstract":"<div><div>Open-vocabulary semantic segmentation enables models to identify novel object categories beyond their training data. While this flexibility represents a significant advancement, current approaches still rely on manually specified class names as input, creating an inherent bottleneck in real-world applications. This work proposes a Vocabulary-Free Semantic Segmentation pipeline, eliminating the need for predefined class vocabularies. Specifically, we address the chicken-and-egg problem where users need knowledge of all potential objects within a scene to identify them, yet the purpose of segmentation is often to discover these objects. The proposed approach leverages Vision–Language Models to automatically recognize objects and generate appropriate class names, aiming to solve the challenge of class specification and naming quality. Through extensive experiments on several public datasets, we highlight the crucial role of the text encoder in model performance, particularly when the image text classes are paired with generated descriptions. Despite the challenges introduced by the sensitivity of the segmentation text encoder to false negatives within the class tagging process, which adds complexity to the task, we demonstrate that our fully automated pipeline significantly enhances vocabulary-free segmentation accuracy across diverse real-world scenarios. Code is available at <span><span>https://github.com/klarareichard/open-vocab2free-seg</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 14-21"},"PeriodicalIF":3.3,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145121300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jean Pacifique Nkurunziza , Fulgence Nahayo , Nicolas Gillis
{"title":"Orthogonal nonnegative matrix factorization with the Kullback–Leibler divergence","authors":"Jean Pacifique Nkurunziza , Fulgence Nahayo , Nicolas Gillis","doi":"10.1016/j.patrec.2025.08.012","DOIUrl":"10.1016/j.patrec.2025.08.012","url":null,"abstract":"<div><div>Orthogonal nonnegative matrix factorization (ONMF) has become a standard approach for clustering. As far as we know, most works on ONMF rely on the Frobenius norm to assess the quality of the approximation. This paper presents a new model and algorithm for ONMF that minimizes the Kullback–Leibler (KL) divergence. As opposed to the Frobenius norm which assumes Gaussian noise, the KL divergence is the maximum likelihood estimator for Poisson-distributed data, which can model better sparse vectors of word counts in document data sets and photo counting processes in imaging. We develop an algorithm based on alternating optimization, KL-ONMF, and show that it performs favorably with the Frobenius-norm based ONMF for document classification and hyperspectral image unmixing.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 353-358"},"PeriodicalIF":3.3,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145104695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dongxue Li , Xin Yang , Songyu Chen , Liwei Deng , Qi Lan , Sijuan Huang , Jing Wang
{"title":"TSMR-Net: a two-stage multimodal medical image registration method via pseudo-image generation and deformable registration","authors":"Dongxue Li , Xin Yang , Songyu Chen , Liwei Deng , Qi Lan , Sijuan Huang , Jing Wang","doi":"10.1016/j.patrec.2025.09.006","DOIUrl":"10.1016/j.patrec.2025.09.006","url":null,"abstract":"<div><div>Multimodal medical image registration is critical for accurate diagnosis, treatment planning, and surgical guidance. However, differences in imaging mechanisms cause substantial appearance discrepancies between modalities, hindering effective feature extraction and similarity measurement. We propose TSMR-Net, a two-stage multimodal registration framework. In the first stage, an Intensity Distribution Regression module nonlinearly transforms the fixed image into a modality-consistent generated fixed-like image, reducing inter-modality appearance gaps. In the second stage, a deformable registration network aligns the generated fixed-like and moving images using a unimodal similarity metric. The architecture incorporates a parallel downsampling module for multi-scale spatial feature capture and residual skip connections with a 3D channel interaction module to enhance feature propagation. Experiments on IXI and BraTS2023 datasets show that TSMR-Net outperforms state-of-the-art methods in alignment precision, structural consistency, and deformation stability. These findings validate the two-stage strategy’s effectiveness in bridging modality gaps and improving registration accuracy. TSMR-Net provides a scalable, robust solution for diverse multimodal registration tasks with strong potential for clinical application.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 359-367"},"PeriodicalIF":3.3,"publicationDate":"2025-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145104698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michalis Lazarou , Sata Atito , Muhammad Awais , Josef Kittler
{"title":"Which images can be effectively learnt from self-supervised learning?","authors":"Michalis Lazarou , Sata Atito , Muhammad Awais , Josef Kittler","doi":"10.1016/j.patrec.2025.09.003","DOIUrl":"10.1016/j.patrec.2025.09.003","url":null,"abstract":"<div><div>Self-supervised learning has shown unprecedented success for learning expressive representations that can be used effectively to solve downstream tasks. However, while the impressive results of self-supervised learning are undeniable there is still a certain mystery regarding how self-supervised learning models learn, what features are they learning and most importantly which examples are hard to learn. Contrastive learning is one of the prominent lines of research in self-supervised learning, where a subcategory of methods relies on knowledge-distillation between a student network and a teacher network which is an exponentially moving average of the student, initially proposed by the seminal work of DINO. In this work we investigate models trained using this family of self-supervised methods and reveal certain properties about them. Specifically, we propose a novel perspective on understanding which examples and which classes are difficult to be learnt effectively during training through the lens of information theory.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 8-13"},"PeriodicalIF":3.3,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145108645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingchun Guo, Xueqi Lv, Gang Yan, Shu Chen, Shi Di
{"title":"TransStyle: Transformer-based StyleGAN for image inversion and editing","authors":"Yingchun Guo, Xueqi Lv, Gang Yan, Shu Chen, Shi Di","doi":"10.1016/j.patrec.2025.09.002","DOIUrl":"10.1016/j.patrec.2025.09.002","url":null,"abstract":"<div><div>Image inversion using StyleGAN retrieves latent codes by embedding real images into the GAN’s latent space, enabling attribute editing and high-quality image generation. However, existing methods often struggle with reconstruction reliability and flexible editing, resulting in low-quality outcomes. To address these issues, we propose TransStyle, a new StyleGAN inversion model based on Transformer technology. Our model features a novel encoder structure, PACP (Path Aggregation with Covariance Pooling), for improved feature representation and a feature prediction head that uses covariance pooling. Additionally, we propose a Transformer-based module to enhance interactions with semantic information in the latent space. StyleGAN then uses this enhanced latent code to generate images with high fidelity and strong editability. Experimental results demonstrate that our method achieves at least 5% higher face reconstruction similarity compared to current state-of-the-art techniques, confirming the advantages of TransStyle in image reconstruction and editing quality.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"198 ","pages":"Pages 1-7"},"PeriodicalIF":3.3,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145108644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}