Fan Yang, Xinqi Liu, Fumin Ma, Xiaojian Ding, Kaixiang Wang
{"title":"Online Asymmetric Supervised Discrete Cross-Modal Hashing for Streaming Multimedia Data","authors":"Fan Yang, Xinqi Liu, Fumin Ma, Xiaojian Ding, Kaixiang Wang","doi":"10.1016/j.patcog.2025.111604","DOIUrl":"10.1016/j.patcog.2025.111604","url":null,"abstract":"<div><div>Cross-modal online hashing, which uses freshly received data to retrain the hash function gradually, has become a research hotspot as a means of handling the massive amounts of streaming data that have been brought about by the fast growth of multimedia technology and the popularity of portable devices. However, in the process of processing stream data in most methods, on the one hand, the relationship between modal classes and the common features between label vectors and binary codes is not fully explored. On the other hand, the semantic information in the old and new data modes is not fully utilized. In this post, we offer Online Asymmetric Supervised Discrete Cross-Modal Hashing for Streaming Multimedia Data (OASCH) as a solution. This study integrates the concept cognition mechanism of dynamic incremental samples and an asymmetric knowledge guidance mechanism into the online hash learning framework. The proposed algorithmic model takes into account the knowledge similarity between newly arriving data and the existing dataset, as well as the knowledge similarity within the new data itself. It projects the hash codes associated with new incoming sample data into the potential space of concept cognition. By doing so, the model maximizes the mining of implicit semantic similarities within streaming data across different time points, resulting in the generation of compact hash codes with enhanced discriminative power, we further propose an adaptive edge regression strategy. Our method surpasses several current sophisticated cross-modal hashing techniques regarding both retrieval efficiency and search accuracy, according to studies on three publicly available multimedia retrieval datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111604"},"PeriodicalIF":7.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rank-revealing fully-connected tensor network decomposition and its application to tensor completion","authors":"Yun-Yang Liu , Xi-Le Zhao , Gemine Vivone","doi":"10.1016/j.patcog.2025.111610","DOIUrl":"10.1016/j.patcog.2025.111610","url":null,"abstract":"<div><div>Fully-connected tensor network (FCTN) decomposition has become a powerful tool for handling high-dimensional data. However, for a given <span><math><mi>N</mi></math></span>th-order data, <span><math><mrow><mi>N</mi><mrow><mo>(</mo><mi>N</mi><mo>−</mo><mn>1</mn><mo>)</mo></mrow><mo>/</mo><mn>2</mn></mrow></math></span> tuning parameters (i.e., FCTN rank) in FCTN decomposition is a tricky challenge, which hinders its wide deployments. Although many recent works have emerged to adaptively search for a (near)-optimal FCTN rank, these methods suffer from expensive computational costs since they require too many search and evaluation processes, significantly limiting their applications to high-dimensional data. To tackle the above challenges, we develop a rank-revealing FCTN (revealFCTN) decomposition, whose FCTN rank is adaptively and efficiently inferred. More specifically, by analyzing the sizes of the sub-network tensors in the FCTN decomposition, we establish the equivalent relationships between the FCTN rank and the ranks of single-mode and double-mode unfolding matrices of the given data. The FCTN rank can be directly revealed through the ranks of these unfolding matrices, which does not require any search and evaluation process, making the computational cost almost negligible compared to the search-based methods. To evaluate the performance of the developed revealFCTN decomposition, we test its performance on a representative task: tensor completion (TC). Comprehensive experimental results demonstrate that our method outperforms several state-of-the-art methods, achieving a MPSNR gain of around 1 dB in most cases compared to the original FCTN decomposition.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111610"},"PeriodicalIF":7.5,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lintao Zhang , Jinjian Wu , Lihong Wang , Li Wang , David C. Steffens , Shijun Qiu , Guy G. Potter , Mingxia Liu
{"title":"Brain anatomy prior modeling to forecast clinical progression of cognitive impairment with structural MRI","authors":"Lintao Zhang , Jinjian Wu , Lihong Wang , Li Wang , David C. Steffens , Shijun Qiu , Guy G. Potter , Mingxia Liu","doi":"10.1016/j.patcog.2025.111603","DOIUrl":"10.1016/j.patcog.2025.111603","url":null,"abstract":"<div><div>Brain structural MRI has been widely used to assess the future progression of cognitive impairment (CI). Previous learning-based studies usually suffer from the issue of small-sized labeled training data, while a huge amount of structural MRIs exist in large-scale public databases. Intuitively, brain anatomical structures derived from these public MRIs (even without task-specific label information) can boost CI progression trajectory prediction. However, previous studies seldom use such brain anatomy structure information as priors. To this end, this paper proposes a brain anatomy prior modeling (BAPM) framework to forecast the clinical progression of cognitive impairment with small-sized target MRIs by exploring anatomical brain structures. Specifically, the BAPM consists of a <em>pretext model</em> and a <em>downstream model</em>, with a shared brain anatomy-guided encoder to model brain anatomy prior using auxiliary tasks explicitly. Besides the encoder, the pretext model also contains two decoders for two auxiliary tasks (<em>i.e.</em>, MRI reconstruction and brain tissue segmentation), while the downstream model relies on a predictor for classification. The brain anatomy-guided encoder is pre-trained with the pretext model on 9,344 auxiliary MRIs without diagnostic labels for anatomy prior modeling. With this encoder frozen, the downstream model is then fine-tuned on limited target MRIs for prediction. We validate BAPM on two CI-related studies with T1-weighted MRIs from 448 subjects. Experimental results suggest the effectiveness of BAPM in (1) four CI progression prediction tasks, (2) MR image reconstruction, and (3) brain tissue segmentation, compared with several state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111603"},"PeriodicalIF":7.5,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptively robust high-order tensor factorization for low-rank tensor reconstruction","authors":"Zihao Song , Yongyong Chen , Zhao Weihua","doi":"10.1016/j.patcog.2025.111600","DOIUrl":"10.1016/j.patcog.2025.111600","url":null,"abstract":"<div><div>Recently, various approaches have been proposed for tensor reconstruction from incomplete and contaminated data. However, most algorithms focus on third-order tensors, neglecting higher-order tensors that are common in real-world applications. Additionally, many studies use LASSO-type penalties or second-order statistics to capture noise patterns, which may not perform well with dense and gross outliers. To address these challenges, we propose a novel robust high-order tensor recovery model that simultaneously removes complex noise and completes missing entries. We introduce a factor Frobenius norm for the low-rank structures of high-order tensors and derive a nonconvex function via the <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> criterion. An estimation algorithm is developed using the alternating minimization method. Our method jointly estimates tensor terms of interest and precision parameters, adapting to noise patterns for data-driven robustness. We analyze the convergence properties of our algorithm, and numerical experiments validate its superiority in natural image reconstruction, video restoration, and background modeling compared to state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111600"},"PeriodicalIF":7.5,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DSDC-NET: Semi-supervised superficial OCTA vessel segmentation for false positive reduction","authors":"Xinyi Liu, Hailan Shen, Wenyan Zhong, Wanqing Xiong, Zailiang Chen","doi":"10.1016/j.patcog.2025.111592","DOIUrl":"10.1016/j.patcog.2025.111592","url":null,"abstract":"<div><div>Accurate vessel segmentation in Optical Coherence Tomography Angiography (OCTA) is essential for ocular disease diagnosis, monitoring, and treatment assessment. However, most current automatic segmentation methods overlook false positives in the segmentation results, leading to potential misdiagnosis and delayed treatment. To address this issue, we propose a Dynamic Spatial Semi-Supervised Vessel Segmentation with Dual Topological Consistency (DSDC-NET) for retinal superficial OCTA images. The network integrates a Dynamic Spatial Attention Mechanism that combines snake-shaped convolution, which captures tubular fine structures, with spatial attention to suppress background noise and artefacts. This design enhances vessel region responses while accurately capturing complex local structures, thereby reducing false positives arising from inaccurate localisation of vessel details. Furthermore, Dual Topological Consistency Loss integrates the Persistent Homology features of the vessel system with the topological skeleton features of major vessels, enhancing branching pattern recognition. A Warm-up mechanism balances the focus of the network between major and branch vessels across training phases, mitigating false positives from inadequate branching structure learning. Comprehensive evaluations on ROSE-1, OCTA-500, and ROSSA datasets demonstrate the superiority of DSDC-NET over existing methods. Notably, DSDC-NET effectively reduces the false discovery rate and improves segmentation accuracy, validating its effectiveness in reducing false positives.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111592"},"PeriodicalIF":7.5,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Disentanglement and codebook learning-induced feature match network to diagnose neurodegenerative diseases on incomplete multimodal data","authors":"Wei Xiong , Tao Wang , Xiumei Chen , Yue Zhang , Wencong Zhang , Qianjin Feng , Meiyan Huang , Alzheimer’s Disease Neuroimaging Initiative","doi":"10.1016/j.patcog.2025.111597","DOIUrl":"10.1016/j.patcog.2025.111597","url":null,"abstract":"<div><div>Multimodal data can provide complementary information to diagnose neurodegenerative diseases (NDs). However, image quality variations and high costs can result in the missing data problem. Although incomplete multimodal data can be projected onto a common space, the traditional projection process may increase alignment errors and lose some modality-specific information. A disentanglement and codebook learning-induced feature match network (DCFMnet) is proposed in this study to solve the aforementioned issues. First, multimodal data are disentangled into latent modality-common and -specific features to help preserve modality-specific information in the subsequent alignment of multimodal data. Second, the latent modal features of all available data are aligned into a common space to reduce alignment errors and fused to achieve ND diagnosis. Moreover, the latent modal features of the modality with missing data are explored in online updated feature codebooks. Last, DCFMnet is tested on two publicly available datasets to illustrate its excellent performance in ND diagnosis.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111597"},"PeriodicalIF":7.5,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mo Guan , Yan Wang , Guangkun Ma , Jiarui Liu , Mingzu Sun
{"title":"MSKA: Multi-stream keypoint attention network for sign language recognition and translation","authors":"Mo Guan , Yan Wang , Guangkun Ma , Jiarui Liu , Mingzu Sun","doi":"10.1016/j.patcog.2025.111602","DOIUrl":"10.1016/j.patcog.2025.111602","url":null,"abstract":"<div><div>Sign language serves as a non-vocal means of communication, transmitting information and significance through gestures, facial expressions, and bodily movements. The majority of current approaches for sign language recognition (SLR) and translation rely on RGB video inputs, which are vulnerable to fluctuations in the background. Employing a keypoint-based strategy not only mitigates the effects of background alterations but also substantially diminishes the computational demands of the model. Nevertheless, contemporary keypoint-based methodologies fail to fully harness the implicit knowledge embedded in keypoint sequences. To tackle this challenge, our inspiration is derived from the human cognition mechanism, which discerns sign language by analyzing the interplay between gesture configurations and supplementary elements. We propose a multi-stream keypoint attention network to depict a sequence of keypoints produced by a readily available keypoint estimator. In order to facilitate interaction across multiple streams, we investigate diverse methodologies such as keypoint fusion strategies, head fusion, and self-distillation. The resulting framework is denoted as MSKA-SLR, which is expanded into a sign language translation (SLT) model through the straightforward addition of an extra translation network. We carry out comprehensive experiments on well-known benchmarks like Phoenix-2014, Phoenix-2014T, and CSL-Daily to showcase the efficacy of our methodology. Notably, we have attained a novel state-of-the-art performance in the sign language translation task of Phoenix-2014T. The code and models can be accessed at: <span><span>https://github.com/sutwangyan/MSKA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111602"},"PeriodicalIF":7.5,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weipeng Jing , Junze Wang , Donglin Di , Dandan Li , Yang Song , Lei Fan
{"title":"Multi-modal hypergraph contrastive learning for medical image segmentation","authors":"Weipeng Jing , Junze Wang , Donglin Di , Dandan Li , Yang Song , Lei Fan","doi":"10.1016/j.patcog.2025.111544","DOIUrl":"10.1016/j.patcog.2025.111544","url":null,"abstract":"<div><div>Self-supervised learning (SSL) has become a dominant approach in multi-modal medical image segmentation. However, existing methods, such as Seq SSL and Joint SSL, suffer from catastrophic forgetting and conflicts in representation learning across different modalities. To address these challenges, we propose a two-stage SSL framework, HyCon, for multi-modal medical image segmentation. It combines the advantages of Seq and Joint SSL using knowledge distillation to align similar topological samples across modalities. In the first stage, cross-modal features are learned through adversarial learning. Inspired by the Graph Foundation Models and further adapted to our task, the Hypergraph Contrastive Learning Network (HCLN) with a teacher-student architecture is subsequently introduced to capture high-order relationships across modalities by integrating hypergraphs with contrastive learning. The Topology Hybrid Distillation (THD) module distills topological information, contextual features, and relational knowledge into the student model. We evaluated HyCon on two organs, lung and brain. Our framework outperformed state-of-the-art SSL methods, achieving significant improvements in segmentation with limited labeled data. Both quantitative and qualitative experiments validate the effectiveness of the design of our framework. Code is available at: <span><span>https://github.com/reeive/HyCon</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111544"},"PeriodicalIF":7.5,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrated subset selection and bandwidth estimation algorithm for geographically weighted regression","authors":"Hyunwoo Lee , Young Woong Park","doi":"10.1016/j.patcog.2025.111589","DOIUrl":"10.1016/j.patcog.2025.111589","url":null,"abstract":"<div><div>This study proposes a mathematical programming-based algorithm for the integrated selection of variable subsets and bandwidth estimation in geographically weighted regression, a local regression method that allows the kernel bandwidth and regression coefficients to vary across study areas. Unlike standard approaches in the literature, in which bandwidth and regression parameters are estimated separately for each focal point on the basis of different criteria, our model uses a single objective function for the integrated estimation of regression and bandwidth parameters across all focal points, based on the regression likelihood function and variance modeling. The proposed model further integrates a procedure to select a single subset of independent variables for all focal points, whereas existing approaches may return heterogeneous subsets across focal points. We then propose an alternative direction method to solve the nonconvex mathematical model and show that it converges to a partial minimum. The computational experiment indicates that the proposed algorithm provides competitive explanatory power with stable spatially varying patterns, with the ability to select the best subset and account for additional constraints.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111589"},"PeriodicalIF":7.5,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning position-aware implicit neural network for real-world face inpainting","authors":"Bo Zhao, Huan Yang, Jianlong Fu","doi":"10.1016/j.patcog.2025.111598","DOIUrl":"10.1016/j.patcog.2025.111598","url":null,"abstract":"<div><div>Face inpainting requires the model to have a precise global understanding of the facial position structure. Benefiting from the powerful capabilities of deep learning backbones, recent works in face inpainting have achieved decent performance in ideal setting (square shape with 512px). However, existing methods often produce a visually unpleasant result, especially in the position-sensitive details (e.g., eyes and nose), when directly applied to arbitrary-shaped images in real-world scenarios. The visually unpleasant position-sensitive details indicate the shortcomings of existing methods in terms of position information processing capability. In this paper, we propose an <strong>I</strong>mplicit <strong>N</strong>eural <strong>I</strong>npainting <strong>N</strong>etwork (IN<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>) to handle arbitrary-shape face images in real-world scenarios by explicit modeling for position information. Specifically, a downsample processing encoder is proposed to reduce information loss while obtaining the global semantic feature. A neighbor hybrid attention block is proposed with a hybrid attention mechanism to improve the model’s facial understanding ability without restricting the input’s shape. Finally, an implicit neural pyramid decoder is introduced to explicitly model position information and bridge the gap between low-resolution features and high-resolution output. Our method achieves optimal facial image restoration performance on both the CelebA-HQ and LFW datasets, as well as downstream tasks of face verification, which introduces more efficient face inpainting algorithm to the fields of image editing software and intelligent security.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111598"},"PeriodicalIF":7.5,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143759848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}