Hao Sheng , Rongshan Chen , Ruixuan Cong , Da Yang , Zhenglong Cui , Sizhe Wang
{"title":"Learn depth space from light field via a distance-constraint query mechanism","authors":"Hao Sheng , Rongshan Chen , Ruixuan Cong , Da Yang , Zhenglong Cui , Sizhe Wang","doi":"10.1016/j.patcog.2025.112403","DOIUrl":"10.1016/j.patcog.2025.112403","url":null,"abstract":"<div><div>The Light Field (LF) captures both spatial and angular information of scenes, enabling precise depth estimation. Recent advancements in deep learning have led to significant success in this field; however, existing methods primarily focus on modeling surface characteristics (e.g., depth maps) while overlooking the depth space, which contains additional valuable information. The depth space consists of numerous space points and provides substantially more geometric data than a single depth map. In this paper, we conceptualize depth prediction as a spatial modeling problem, aiming to learn the entire depth space rather than merely a single depth map. Specifically, we define space points as signed distances relative to the scene surface and propose a novel distance-constraint query mechanism for LF depth estimation. To model the depth space effectively, we first develop a mixed sampling strategy to approximate its data representation. Subsequently, we introduce an encoder-decoder network architecture to query the distances of each point, thereby implicitly embedding the depth space. Finally, to extract the target depth map from this space, we present a generation algorithm that iteratively invokes the decoder network. Through extensive experiments, our approach achieves the highest performance on LF depth estimation benchmarks, and also demonstrates superior performance on various synthetic and real-world scenes.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112403"},"PeriodicalIF":7.6,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint luminance-chrominance learning for quality assessment of low-light image enhancement","authors":"Tuxin Guan , Qiuping Jiang , Xiongli Chai , Chaofeng Li","doi":"10.1016/j.patcog.2025.112395","DOIUrl":"10.1016/j.patcog.2025.112395","url":null,"abstract":"<div><div>Existing methods for low-light enhancement quality assessment (LEQA) often underperform across diverse scenarios. One reason is that most of them rely on shallow feature respresentations, while another is that deep-learning-based counterparts fail to make full use of the unique characteristics of low-light enhanced images (LEIs), such as luminance enhancement and color refinement. In this paper, we propose a novel Joint Luminance-Chrominance Learning Network (JLCLNet) for LEQA to comprehensively assess the effects of low-light image enhancement (LLIE) algorithms. Specifically, we construct a two-branch network architecture consisting of a luminance learning branch and a chrominance learning branch. In the luminance learning branch, the low- and high-frequency subbands of the luminance channel in the CIELAB color space, derived from the dual-tree complex wavelet transform (DTCWT), focus on measuring contrast enhancement and structure preservation. Meanwhile, the chrominance learning branch addresses potential color distortions by integrating perceptual information from the two parallel chrominance channels of the CIELAB color space. Finally, the complementary features from both branches are fused to predict quality scores. Experimental results on four public LEQA databases demonstrate the performance advantages of the proposed method compared to the state-of-the-art approaches. The source code of JLCLNet is available at <span><span>https://github.com/li181119/JLCLNET</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112395"},"PeriodicalIF":7.6,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinjun Bian , Huan Lin , Yumeng Wang , Lingqiao Li , Zhenbing Liu , Huadeng Wang , Zhenwei Shi , Yi Qian , Zaiyi Liu , Rushi Lan , Xipeng Pan
{"title":"Federated cross-source learning for lung nodule segmentation with data characteristic-aware weight optimization","authors":"Xinjun Bian , Huan Lin , Yumeng Wang , Lingqiao Li , Zhenbing Liu , Huadeng Wang , Zhenwei Shi , Yi Qian , Zaiyi Liu , Rushi Lan , Xipeng Pan","doi":"10.1016/j.patcog.2025.112396","DOIUrl":"10.1016/j.patcog.2025.112396","url":null,"abstract":"<div><div>Federated learning enables multiple medical institutions to undertake distributed training while protecting patient privacy. Nevertheless, the significant variance in data distributions across diverse sites results in imbalanced knowledge acquisition, thereby affecting the performance of the global model. To tackle this challenge, we propose a novel federated algorithm for lung nodule segmentation, incorporating a Cross-source Learning (CSL) method. This method generates pseudo nodules by synthesizing the nodule phase spectrum with the nodule amplitude spectrum from other clients. These pseudo nodules are subsequently embedded into pulmonary regions to augment the data. By incorporating knowledge from various clients, which alleviates the challenges posed by non-IID data. On the server side, a Data Characteristic-aware Weight Optimization (DCWO) method is proposed to incorporate client data quality assessment and the size of lung nodule volume as weights to optimize both model performance and fairness. On the client side, we design a Multi-scale Attention Dynamic Convolution (MADC) lightweight network, which dynamically adapts attention to different spatial regions and extracts features at multiple scales. The performance of our method is superior to the state-of-the-art methods on six public and in-house CT datasets of lung cancer.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112396"},"PeriodicalIF":7.6,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Asymmetric simulation-enhanced flow reconstruction for incomplete multimodal learning","authors":"Jiacheng Yao , Jing Zhang , Yixiao Wang , Li Zhuo","doi":"10.1016/j.patcog.2025.112413","DOIUrl":"10.1016/j.patcog.2025.112413","url":null,"abstract":"<div><div>Incomplete multimodal learning addresses the common real-world challenge of missing modalities, which undermines the performance of standard multimodal methods. Existing solutions struggle with distribution mismatches between reconstructed and observed data, asymmetric cross-modal structures, and insufficient cross-modal knowledge sharing. To tackle these issues, we propose an asymmetric simulation-enhanced flow reconstruction (ASE-FR) framework, which contains following contributions: (1) Distribution-consistent flow reconstruction module that align available and missing modality distributions by normalizing flows; (2) Asymmetric simulation module that perturbs and randomly masks features to mimic real-world modality absence and improve robustness; (3) Modal-shared knowledge distillation that transfers shared representations from teacher encoders to a student encoder through contrastive learning. This framework is applicable to a range of real-world scenarios, such as multi-sensor networks in smart manufacturing, medical diagnostic systems combining imaging and electronic health records, and autonomous driving platforms that integrate camera and LiDAR data. The experimental results show that our ASE-FR method achieves 94.71 %, 41.85 % and 81.90 % accuracy on Audiovision-MNIST, MM-IMDb and IEMOCAP datasets, as well as 1.1376 error rate on CMU-MOSI dataset, which exhibits competitive performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112413"},"PeriodicalIF":7.6,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145046673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical community-based graph generation model for improving structural diversity","authors":"Masoomeh Sadat Razavi, Abdolreza Mirzaei, Mehran Safayani","doi":"10.1016/j.patcog.2025.112320","DOIUrl":"10.1016/j.patcog.2025.112320","url":null,"abstract":"<div><div>Graph generation remains a challenging task due to the high dimensionality of graphs and the complex dependencies among their edges. Existing models often struggle to produce structurally diverse graphs. To address this limitation, we propose a novel generative framework specifically designed to capture structural diversity in graph generation. Our approach follows a sequential process: initially, a community detection algorithm partitions the input graph into distinct communities. Each community is then generated independently using deep generative models, while a dedicated module concurrently learns the interconnections between communities. To scale to graphs with a larger number of communities, we extend our approach into a hierarchical generative model. The proposed framework not only improves generation accuracy but also significantly reduces generation time for large-scale graphs. Moreover, it enables the application of prior methods that were previously incapable of handling such graphs. To highlight the shortcomings of existing approaches, we conduct experiments on a synthetic dataset comprising diverse graph structures. The results demonstrate substantial improvements in standard evaluation metrics as well as in the quality of the generated graphs.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112320"},"PeriodicalIF":7.6,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145026278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A graph contrastive learning network for change detection with heterogeneous remote sensing images","authors":"Zhiyong Lv , Sizhe Cheng , Linfu Xie , Junhuai Li , Minghua Zhao","doi":"10.1016/j.patcog.2025.112394","DOIUrl":"10.1016/j.patcog.2025.112394","url":null,"abstract":"<div><div>Land cover change detection (LCCD) with heterogeneous remote sensing images (Hete-RSIs) is an attractive topic in the community of remote sensing applications. Intuitively, Hete-RSIs are acquired with different remote sensors, and they cannot be compared directly for LCCD because of the different imaging modalities. In this paper, a graph contrastive learning network (GCLN) is proposed for LCCD with bitemporal Hete-RSIs. First, with the motivation of smoothing the noise and utilizing contextual information, the k-nearest neighbor algorithm is used to improve the spectral homogeneity of the pixels within a superpixel. Then, a pairwise graph is constructed on the basis of each superpixel from spectral similarity and dissimilarity perspectives, and a graph feature learning network is designed to learn the near-far dependencies of graph features for change detection. Finally, the similarity and dissimilarity loss functions are coupled as a contrastive loss function to expand the difference between similar and dissimilar features. Comparisons with seven advanced methods on five pairs of Hete-RSIs demonstrate the feasibility and superiority of the proposed GCLN for LCCD with Hete-RSIs. For example, the improvements on the five datasets are 3.63<span><math><mo>%</mo></math></span>, 8.47<span><math><mo>%</mo></math></span>, 4.17<span><math><mo>%</mo></math></span>, 8.23<span><math><mo>%</mo></math></span>, and 4.98<span><math><mo>%</mo></math></span> in terms of overall accuracy. The code of the proposed approach can be available at: <span><span>https://github.com/ImgSciGroup/2024-GCLN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112394"},"PeriodicalIF":7.6,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Edge Craft Odyssey: Navigating guided super-resolution with a fast, precise, and lightweight network","authors":"Armin Mehri , Parichehr Behjati , Dario Carpio , Angel D. Sappa","doi":"10.1016/j.patcog.2025.112392","DOIUrl":"10.1016/j.patcog.2025.112392","url":null,"abstract":"<div><div>Thermal imaging technology is exceptionally valuable in environments where visibility is limited or nonexistent. However, the high cost and technological limitations of high-resolution thermal imaging sensors restrict their widespread use. Many thermal cameras are now paired with high-resolution visible cameras, which can help improve low-resolution thermal images. However, aligning thermal and visible images is challenging due to differences in their spectral ranges, making pixel-wise alignment difficult. Therefore, we present the Edge Craft Odyssey Network (ECONet), a lightweight transformer-based network designed for Guided Thermal Super-Resolution (GTSR) to address these challenges. Our approach introduces a Progressive Edge Prediction module that extracts edge features from visible images using an adaptive threshold within our innovative Edge-Weighted Gradient Blending technique. This technique provides precise control over the blending intensity between low-resolution thermal and visible images. Additionally, we introduce a lightweight Cascade Deep Feature Extractor that focuses on efficient feature extraction and edge weight highlighting, enhancing the representation of high-frequency details. Experimental results show that ECONet outperforms state-of-the-art methods across various datasets while maintaining a relatively low computational and memory requirements. ECONet improves performance by up to 0.20 to 1.3 dB over existing methods and generates super-resolved images in a fraction of a second, approximately <span><math><mrow><mn>91</mn><mspace></mspace><mo>%</mo></mrow></math></span> faster than the other methods. The code is available at <span><span>https://github.com/Rm1n90/ECONet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112392"},"PeriodicalIF":7.6,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145020173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual-teacher self-distillation registration for multi-modality medical image fusion","authors":"Aimei Dong , Jingyuan Xu , Long Wang","doi":"10.1016/j.patcog.2025.112373","DOIUrl":"10.1016/j.patcog.2025.112373","url":null,"abstract":"<div><div>Misaligned multimodal medical images pose challenges to the fusion task, resulting in structural distortions and edge artifacts in the fusion results. Existing registration networks primarily consider single-scale deformation fields at each stage, thereby neglecting long-range connections between non-adjacent stages. Moreover, in the fusion task, due to the quadratic computational complexity faced by Transformers during feature extraction, they are unable to effectively capture long-range correlated features. To address these problems, we propose an image registration and fusion method called DTMFusion. DTMFusion comprises two main networks: a Dual-Teacher Self-Distillation Registration (DTSDR) network and a Mamba-Conv-based Fusion (MCF) network. The registration network employs a pyramid progressive architecture to generate independent deformation fields at each layer. We introduce a dual-teacher self-distillation scheme that leverages past learning history and the current network structure as teacher guidance to constrain the generated deformation fields. For the fusion network, we introduced Mamba to address the quadratic complexity problem of Transformers. Specifically, the fusion network involves two key components: the Shallow Fusion Module (SFM) and the Cross-Modality Fusion Module (CFM). The SFM achieves lightweight cross-modality interaction through channel exchange, while the CFM leverages inherent cross-modality relationships to enhance the representation capability of fusion results. Through the collaborative effort of these components, the network can effectively integrate cross-modality complementary information and maintain appropriate apparent strength from a global perspective. Extensive experimental analysis demonstrates the superiority of this method in fusing misaligned medical images.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112373"},"PeriodicalIF":7.6,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145010712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HopGAT: A multi-hop graph attention network with heterophily and degree awareness","authors":"Han Zhang, Huan Wang, Mingjing Han","doi":"10.1016/j.patcog.2025.112387","DOIUrl":"10.1016/j.patcog.2025.112387","url":null,"abstract":"<div><div>In highly heterophilic graphs, where nodes frequently connect across categories, the attention learning mechanism by dynamically adjusting neighboring node weights, may struggle to capture intricate node relationships. Furthermore, first-hop neighbor information is usually insufficient to encompass the global structure, but multi-hop increases complexity. To address these challenges, we propose HopGAT, a multi-hop graph attention network with heterophily and degree awareness. Firstly, we design heterophily-based neighbor sampling to sequentially filter high-hop neighbors by degree. Next, to obtain comprehensive global information, we construct a multi-hop recursive learning method with head and tail attention vectors to learn multi-hop neighbor features. Finally, we combine the average node degree of the graph with hop decay modeling to learn importance coefficients at different hops and adaptively aggregate the learned multi-hop features. Experimental results demonstrate that HopGAT significantly improves performance across 9 benchmark datasets with various heterophily and different average degrees.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112387"},"PeriodicalIF":7.6,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145004638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chicheng Zhou , Minghui Wang , Yi Shi , Anli Zhang , Ao Li
{"title":"Understanding and tackling the modality imbalance problem in multimodal survival prediction","authors":"Chicheng Zhou , Minghui Wang , Yi Shi , Anli Zhang , Ao Li","doi":"10.1016/j.patcog.2025.112398","DOIUrl":"10.1016/j.patcog.2025.112398","url":null,"abstract":"<div><div>Benefiting from the in-depth integration of multimodal data, survival prediction has emerged as a pivotal task in cancer prognosis by facilitating personalized treatment planning and medical resource allocation. In this study, we report an intriguing phenomenon of inter-modality capability gap (ICG) enlargement during joint survival modelling of genomics data and pathology images. This observation, supported by our dedicated theoretical analysis, uncovers a previously unrecognized modality imbalance problem, where pathology modality suffers from limited gradient propagation and insufficient learning while genomics modality dominates in reducing survival loss. To tackle this problem, we further propose a balanced multimodal learning approach for survival prediction named BMLSurv, which introduces two innovative auxiliary learning strategies: self-enhancement learning (SEL) and peer-assistance learning (PAL). The SEL strategy exploits a real-time imbalance measure to guide extra task-aware supervision, therefore dynamically strengthening pathology-specific gradient propagation in a self-enhanced manner. Meanwhile, the PAL strategy leverages the stronger genomics modality as a “helpful peer” to assist the sufficient learning of pathology modality via a new risk-ranking distillation technique. Extensive experiments on representative cancer datasets demonstrate that by successfully address the modality imbalance problem, BMLSurv remarkably narrows the ICG in joint survival modelling and consistently outperforms state-of-the-art methods by a large margin. These results underscore the potential of BMLSurv to advance multimodal survival prediction and enhance clinical decision-making in cancer prognosis.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112398"},"PeriodicalIF":7.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144997397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}