Sheng Wang , Chaoyue Zhao , Qiao Wang, Mingzhi Liu, Chao Mou, Fu Xu
{"title":"Zero-shot image denoising with hollow pair sampling and noise-aware attention","authors":"Sheng Wang , Chaoyue Zhao , Qiao Wang, Mingzhi Liu, Chao Mou, Fu Xu","doi":"10.1016/j.patcog.2025.111779","DOIUrl":"10.1016/j.patcog.2025.111779","url":null,"abstract":"<div><div>Image acquisition, compression, or transmission processes frequently introduce noise distortions, which can significantly degrade the visual quality of images. Existing zero-shot denoising methods face challenges in effectively differentiate between noise and subtle image content variations, especially when dealing with structured noise, non-zero-mean noise, and high-noise environments. As a result, these methods may lead to over-smoothing or the loss of fine details, ultimately degrading the overall image quality. In this paper, we propose a novel image denoising technique that does not rely on clean reference images for training. Instead, we utilize secondary sampling and convolution from the original noisy images to generate denoised images with enhanced contrast. Our sampling strategy expands upon the Zero Noise2Noise (ZS-N2N) approach, eliminating the need for additional noise models or parameters. By employing a straightforward hollow filter and noise-aware attention, our method achieves high-quality denoising across various noise types and levels while effectively distinguishing meaningful image features from noisy patterns. Experimental evaluations on visible light and infrared images demonstrate the effectiveness of our approach. Notably, our method excels in restoring image details. Under Gaussian noise, the visible image achieves the PSNR of 37.78 and an SSIM of 0.9460, while the infrared image attains the PSNR of 36.61 and an SSIM of 0.9415. Overall, our method successfully mitigates noise distortion while preserving rich image details, significantly enhancing visual quality.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111779"},"PeriodicalIF":7.5,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143947935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tingting Zheng , Hongxun Yao , Sicheng Zhao , Kui Jiang , Yi Xiao
{"title":"GraphMamba: Whole slide image classification meets graph-driven selective state space model","authors":"Tingting Zheng , Hongxun Yao , Sicheng Zhao , Kui Jiang , Yi Xiao","doi":"10.1016/j.patcog.2025.111768","DOIUrl":"10.1016/j.patcog.2025.111768","url":null,"abstract":"<div><div>Multi-instance learning (MIL) has demonstrated promising performance in whole slide image (WSI) analysis. However, existing transformer-based methods struggle back and forth between global representation capability and quadratic complexity, particularly when handling millions of instances. Recently, the selective state space model (Mamba) has emerged as a promising alternative for modeling long-range dependencies with linear complexity. Nonetheless, WSI remains challenging for Mamba due to its inability to capture complex local tissue and structural patterns, which is crucial for accurate tumor region recognition. To this end, we approach WSI classification from a graph-based perspective and present GraphMamba, a novel method that constructs multi-level graphs across instances. GraphMamba involves two key components: intra-group graph mamba (IGM) to grasp instance-level dependencies, and cross-group graph mamba (CGM) for exploring group-level relationships. In particular, before aggregating group features into a comprehensive bag representation, CGM utilizes a cross-group feature sampling scheme to extract the most informative features across groups, enabling compact and discriminative representations. Extensive experiments on four datasets demonstrate that GraphMamba outperforms state-of-the-art ACMIL method by 0.5%, 3.1%, 2.6%, and 3.0% in accuracy on the TCGA BRCA, TCGA Lung, TCGA ESCA, and BRACS datasets. The source code will be available at <span><span>https://github.com/titizheng/GraphMamba</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111768"},"PeriodicalIF":7.5,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143947937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Tan , Yueyang Wang , Ziyi Xiao , Dandan He , Guodong Sa
{"title":"From a multi-period perspective: A periodic dynamics forecasting network for multivariate time series forecasting","authors":"Gang Tan , Yueyang Wang , Ziyi Xiao , Dandan He , Guodong Sa","doi":"10.1016/j.patcog.2025.111760","DOIUrl":"10.1016/j.patcog.2025.111760","url":null,"abstract":"<div><div>To achieve more accurate multivariate time series (MTS) forecasting, it is crucial to extract temporal features of individual univariate series and interdependencies among multiple variables that evolve over time (i.e., dynamic interdependencies). Many MTS manifest inherent periodic fluctuations, which are highly significant for effective modeling. However, existing methods for modeling periodic features mainly focus on multi-period temporal features of univariate series and have not sufficiently addressed complex interdependencies in multiple periods, including dynamic dependencies among variables within the same period (Intra-period) and across different periods (Inter-period). Furthermore, methods for modeling dynamic interdependencies are insufficient for mining the inherent multi-periodicity of MTS. Thus, collaboratively capturing both intra-period and inter-period interdependencies remains challenging. To address above challenges, this paper introduces the Periodic Dynamics Forecasting Network (<strong>PDFNet</strong>) for modeling multi-period dynamic interdependencies of MTS. We design a periodic feature extraction module that utilizes frequency domain analysis to identify the multi-period features of MTS. The multi-period temporal networks module is designed to capture temporal features within and across periods from a multi-period perspective. To capture intra-period and inter-period dynamic dependencies among multiple variables, we propose a gated periodic recurrent unit and a gated graph structure learning module to construct dynamic graphs and then effectively learn intra-period and inter-period information through dynamic graph convolution networks. Extensive experiments on multiple MTS datasets have demonstrated our superior performance compared with state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111760"},"PeriodicalIF":7.5,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143937595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synergistic fusion framework: Integrating training and non-training processes for accelerated graph convolution network-based recommendation","authors":"Fan Mo , Xin Fan , Chongxian Chen , Hayato Yamana","doi":"10.1016/j.patcog.2025.111829","DOIUrl":"10.1016/j.patcog.2025.111829","url":null,"abstract":"<div><div>The training and inference (generating recommendation lists) of Graph convolution networks (GCN)-based recommendation models are time-consuming. Existing techniques aim to improve the training speed by proposing new GCN variants. However, the development of GCN leads to multiple technological branches using graph-enhancement techniques, including subgraph and edge sampling techniques. Simply proposing a GCN variant for training acceleration is inadequate, lacking a generalized training acceleration framework for multiple GCN models. Another weakness of previous studies is neglecting the importance of inference speed. This study introduces a candidate-based fusion framework to accelerate the training and inference of GCN models. The idea for training acceleration is to achieve layer compression by aggregating information directly from candidate items generated in a non-training process. Besides, we achieve inference acceleration by ranking items only in the candidate sets. The proposed framework is generalized across six state-of-the-art GCN models. Experimental results confirm the effectiveness of the method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111829"},"PeriodicalIF":7.5,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143943488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang He , Chang Wu , Guancheng Quan , Xinquan Lai , Yunsong Li
{"title":"SA-CVSR: Scale-Arbitrary Compressed Video Super-Resolution","authors":"Gang He , Chang Wu , Guancheng Quan , Xinquan Lai , Yunsong Li","doi":"10.1016/j.patcog.2025.111745","DOIUrl":"10.1016/j.patcog.2025.111745","url":null,"abstract":"<div><div>To mitigate transmission and storage expenses, the existing compressed video super-resolution (CVSR) approaches typically downsample high-resolution (HR) videos before encoding and then restore decoded videos to their original resolution leveraging deep neural networks (DNNs). However, they employ fixed integer scale factors for diverse video types and compression ratios, potentially resulting in suboptimal performance. In this paper, we propose a Scale-Arbitrary Compressed Video Super-Resolution (SA-CVSR) approach to achieve the optimal trade-off between bit-rate and quality. In our approach, we first apply a Support Vector Machine (SVM) based scale predictor to determine the optimal scale factors for an individual video across various compression ratios. Then, we design a novel Priors-Guided Restoration–Reconstruction Network (PGRRN), which is constructed by stacking multiple Priors-Guided Processing Blocks (PGPBs), to process low-resolution (LR) compressed videos in two stages. Specifically, in the restoration stage, PGPBs perform precise motion compensation between two temporally adjacent frames and incorporate the coding prior, which enables PGRRN to effectively eliminate the compression damage content-adaptively. In the subsequent reconstruction stage, PGPBs incorporate the scale prior to achieve high-quality scale-arbitrary super-resolution. Extensive experimental results provide evidence of the effectiveness of SA-CVSR, as it demonstrates a substantial improvement in bit-rate reduction when compared to other CVSR approaches on multiple datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111745"},"PeriodicalIF":7.5,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiebin Yan , Yuming Fang , Xuelin Liu , Wenhui Jiang , Yang Liu
{"title":"Opinion-unaware blind stereoscopic image quality assessment: A comprehensive study","authors":"Jiebin Yan , Yuming Fang , Xuelin Liu , Wenhui Jiang , Yang Liu","doi":"10.1016/j.patcog.2025.111749","DOIUrl":"10.1016/j.patcog.2025.111749","url":null,"abstract":"<div><div>The development of blind Stereoscopic Image Quality Assessment (SIQA) is hindered by the limited availability of large-scale datasets. Existing SIQA databases typically contain only a few hundred images which are often derived from a small number of original sources. This data scarcity poses a significant challenge, particularly in the deep learning era, as it increases the risk of overfitting. Moreover, it makes performance comparisons of different blind SIQA models unreliable when using publicly available databases. Consequently, determining the best-performing model remains difficult under current evaluation methodologies. To address this limitation and advance SIQA research, we construct the largest and most diverse SIQA database to date which incorporates both image-level coarse labels and single-view pseudo labels. Utilizing this extensive dataset, we have conducted comprehensive study on blind SIQA models, exploring variations in network architecture, input size and auxiliary supervision signals. The representational capabilities of various blind SIQA models and their variants are systematically evaluated under consistent training conditions, specifically pairwise opinion-unaware learning. This new benchmark provides a more reliable platform for comparing blind SIQA models, enabling fairer and more comprehensive assessments of their relative strengths and limitations.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111749"},"PeriodicalIF":7.5,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143943486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianyong Liu , Zhiqing Zhang , Guojia Fan , Nan Li , Chengwu Xu , Bin Li , Gang Zhao , Shoujun Zhou
{"title":"FocusMorph: A novel multi-scale fusion network for 3D brain MR image registration","authors":"Tianyong Liu , Zhiqing Zhang , Guojia Fan , Nan Li , Chengwu Xu , Bin Li , Gang Zhao , Shoujun Zhou","doi":"10.1016/j.patcog.2025.111761","DOIUrl":"10.1016/j.patcog.2025.111761","url":null,"abstract":"<div><div>In the field of medical image processing, registration algorithms are crucial tools, especially in assisting physicians with aligning medical images acquired at different time points or through different modalities. These techniques are particularly important for medical applications such as disease diagnosis, lesion detection, surgical planning, and treatment monitoring. However, although most deep learning-based methods are capable of extracting multiscale features, they may fail to produce outputs that are directly related to the fnal deformation feld. Additionally, many methods based on the U-Net structure overly rely on the last layer of high-resolution images, which represents a significant drawback. To address these issues, we propose a novel unsupervised deformable registration method named FocusMorph. This method centers on the FLatten Transformer block and employs a focused linear attention mechanism to enhance attentional expressivity while maintaining low complexity. We have also designed a layer-by-layer output fusion mechanism and a motion image encoder specifically for medical image registration, which aids in continuously tracking positional differences between motion images and effectively fusing them. Experimental results indicate that the FocusMorph method surpasses current leading medical image registration techniques on two distinct brain image datasets. It achieves improvements in the Dice coefficient by 2.6% and 1.5%, respectively, confirming its superior performance and significant potential in image registration. These findings not only highlight FocusMorph’s robust registration capabilities but also underscore its promising prospects in medical image processing.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111761"},"PeriodicalIF":7.5,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenhao Xu , Chang-Tsun Li , Yongjian Hu , Chee Peng Lim , Douglas Creighton
{"title":"Deep learning techniques for Video Instance Segmentation: A survey","authors":"Chenhao Xu , Chang-Tsun Li , Yongjian Hu , Chee Peng Lim , Douglas Creighton","doi":"10.1016/j.patcog.2025.111763","DOIUrl":"10.1016/j.patcog.2025.111763","url":null,"abstract":"<div><div>Video Instance Segmentation (VIS), also known as multi-object tracking and segmentation, represents a fundamental challenge in computer vision that requires simultaneous detection, segmentation, and tracking of object instances across video frames. This complex task has gained significant attention due to its crucial role in various real-world applications. The advent of deep learning has promoted VIS approaches, leading to numerous architectural innovations and performance improvements. This survey presents a systematic review of deep learning-based VIS methods, introducing a novel categorization based on temporal modeling strategies: frame-by-frame, clip-based, in-memory feature propagation, and in-memory object query propagation. Comprehensive quantitative comparisons of existing work across three major VIS benchmark datasets are also provided. Additionally, emerging challenges in the field are explored, with several promising research directions identified, aiming to provide valuable insights for researchers and practitioners interested in VIS, while further advancing deep learning techniques for VIS.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111763"},"PeriodicalIF":7.5,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaming Lei , Sijing Wu , Lin Li , Lei Chen , Jun Xiao , Yi Yang , Long Chen
{"title":"Knowledge Integration for Grounded Situation Recognition","authors":"Jiaming Lei , Sijing Wu , Lin Li , Lei Chen , Jun Xiao , Yi Yang , Long Chen","doi":"10.1016/j.patcog.2025.111766","DOIUrl":"10.1016/j.patcog.2025.111766","url":null,"abstract":"<div><div>Grounded Situation Recognition (GSR) involves interpreting complex events in images by identifying key verbs (<em>e</em>.<em>g</em>., <span>sketching</span>), detecting related semantic roles (<em>e</em>.<em>g</em>., AGENT is <span>man</span>), and localizing noun entities with bounding boxes. Due to the inherent semantic correlations between verbs and noun entities, existing methods predominantly focus on leveraging these correlations to refine verb predictions using noun entities, or vice versa. However, these approaches often disregard the long-tailed distributions inherent in training dataset, resulting in biased predictions and poor accuracy when recognizing less frequent noun entities and verbs. To tackle this issue, we introduce a novel <u><strong>K</strong></u>n<u><strong>O</strong></u>wledge <u><strong>I</strong></u>ntegration (<strong>KOI</strong>) strategy that alleviates the bias by distinctively merging two types of knowledge: general knowledge and downstream knowledge of GSR-specific. Specifically, the integration employs vision-language models (VLMs), <em>e</em>.<em>g</em>., CLIP, for extracting expansive, contextual general knowledge, potentially beneficial for tail category recognition, and harnesses pre-trained GSR models for detailed, domain-focused downstream knowledge, typically advantageous for head category recognition. To bridge general and specific gaps, we devise a trade-off weighting strategy to effectively merge these diverse insights, ensuring a robust prediction that is not extremely biased towards either head or tail categories. KOI’s model-agnostic nature facilitates its integration into various GSR frameworks, proving its universality. Extensive experimental results on the SWiG dataset demonstrate that KOI significantly outperforms existing methods, establishing new state-of-the-art performance across multiple metrics.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111766"},"PeriodicalIF":7.5,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Cao, Peiyuan Quan, Yuzhu Mao, Rui Cao, Linzhi Su, Kang Li
{"title":"TRRS-DM: Two-stage Resampling and Residual Shifting for high-fidelity texture inpainting of Terracotta Warriors utilizing Diffusion Models","authors":"Xin Cao, Peiyuan Quan, Yuzhu Mao, Rui Cao, Linzhi Su, Kang Li","doi":"10.1016/j.patcog.2025.111753","DOIUrl":"10.1016/j.patcog.2025.111753","url":null,"abstract":"<div><div>As a UNESCO World Heritage Site, the Terracotta Warriors face degradation from natural erosion. Traditional restoration is time-consuming, while computer-aided methods provide efficient digital solutions. We propose a <strong>T</strong>wo-stage <strong>R</strong>esampling and <strong>R</strong>esidual <strong>S</strong>hifting framework using <strong>D</strong>iffusion <strong>M</strong>odels (<span><math><mi>TRRS − DM</mi></math></span>) for texture inpainting. The ResampleDiff module enhances details via perception-weighted learning and lightweight diffusion. The RefineDiff module refines results in latent space by removing noise. Experiments demonstrate that TRRS-DM achieves faster computation, surpasses existing methods in visual quality, and effectively restores damaged artifacts. This approach advances digital heritage restoration and providing scalable supports for archaeological conservation. Our code is available at <span><span>https://github.com/Emwew/TRRS-DM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111753"},"PeriodicalIF":7.5,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143927895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}