Lijian Yang , Jianxun Mi , Weisheng Li , Guofen Wang , Bin Xiao
{"title":"Improving the sparse coding model via hybrid Gaussian priors","authors":"Lijian Yang , Jianxun Mi , Weisheng Li , Guofen Wang , Bin Xiao","doi":"10.1016/j.patcog.2024.111102","DOIUrl":"10.1016/j.patcog.2024.111102","url":null,"abstract":"<div><div>Sparse Coding (SC) imposes a sparse prior on the representation coefficients under a dictionary or a sensing matrix. However, the sparse regularization, approximately expressed as the <span><math><msub><mrow><mi>L</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-norm, is not strongly convex. The uniqueness of the optimal solution requires the dictionary to be of low mutual coherence. As a specialized form of SC, Convolutional Sparse Coding (CSC) encounters the same issue. Inspired by the Elastic Net, this paper proposes to learn an additional anisotropic Gaussian prior for the sparse codes, thus improving the convexity of the SC problem and enabling the modeling of feature correlation. As a result, the SC problem is modified by the proposed elastic projection. We thereby analyze the effectiveness of the proposed method under the framework of LISTA and demonstrate that this simple technique has the potential to correct bad codes and reduce the error bound, especially in noisy scenarios. Furthermore, we extend this technique to the CSC model for the vision practice of image denoising. Extensive experimental results show that the learned Gaussian prior significantly improves the performance of both the SC and CSC models. Source codes are available at <span><span>https://github.com/eeejyang/EPCSCNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111102"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiahui Wang , Dongsheng Ruan , Yang Li , Zefeng Wang , Yongquan Wu , Tao Tan , Guang Yang , Mingfeng Jiang
{"title":"Data augmentation strategies for semi-supervised medical image segmentation","authors":"Jiahui Wang , Dongsheng Ruan , Yang Li , Zefeng Wang , Yongquan Wu , Tao Tan , Guang Yang , Mingfeng Jiang","doi":"10.1016/j.patcog.2024.111116","DOIUrl":"10.1016/j.patcog.2024.111116","url":null,"abstract":"<div><div>Exploiting unlabeled and labeled data augmentations has become considerably important for semi-supervised medical image segmentation tasks. However, existing data augmentation methods, such as Cut-mix and generative models, typically dependent on consistency regularization or ignore data correlation between slices. To address cognitive biases problems, we propose two novel data augmentation strategies and a Dual Attention-guided Consistency network (DACNet) to improve semi-supervised medical image segmentation performance significantly. For labeled data augmentation, we randomly crop and stitch annotated data rather than unlabeled data to create mixed annotated data, which breaks the anatomical structures and introduces voxel-level uncertainty in limited annotated data. For unlabeled data augmentation, we combine the diffusion model with the Laplacian pyramid fusion strategy to generate unlabeled data with higher slice correlation. To enhance the decoders to learn different semantic but discriminative features, we propose the DACNet to achieve structural differentiation by introducing spatial and channel attention into the decoders. Extensive experiments are conducted to show the effectiveness and generalization of our approach. Specifically, our proposed labeled and unlabeled data augmentation strategies improved accuracy by 0.3% to 16.49% and 0.22% to 1.72%, respectively, when compared with various state-of-the-art semi-supervised methods. Furthermore, our DACNet outperforms existing methods on three medical datasets (91.72% dice score with 20% labeled data on the LA dataset). Source code will be publicly available at <span><span>https://github.com/Oubit1/DACNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111116"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenjie Mao , Bin Yu , Chen Zhang , A.K. Qin , Yu Xie
{"title":"FedKT: Federated learning with knowledge transfer for non-IID data","authors":"Wenjie Mao , Bin Yu , Chen Zhang , A.K. Qin , Yu Xie","doi":"10.1016/j.patcog.2024.111143","DOIUrl":"10.1016/j.patcog.2024.111143","url":null,"abstract":"<div><div>Federated Learning enables clients to train a joint model collaboratively without disclosing raw data. However, learning over non-IID data may raise performance degeneration, which has become a fundamental bottleneck. Despite numerous efforts to address this issue, challenges such as excessive local computational burdens and reliance on shared data persist, rendering them impractical in real-world scenarios. In this paper, we propose a novel federated knowledge transfer framework to overcome data heterogeneity issues. Specifically, a model segmentation distillation method and a learnable aggregation network are developed for server-side knowledge ensemble and transfer, while a client-side consistency-constrained loss is devised to rectify local updates, thereby enhancing both global and client models. The framework considers both diversity and consistency among clients and can serve as a general solution for extracting knowledge from distributed nodes. Extensive experiments on four datasets demonstrate our framework’s effectiveness, achieving superior performance compared to advanced competitors in high-heterogeneity settings.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111143"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wentao He , Jianfeng Ren , Ruibin Bai , Xudong Jiang
{"title":"Radar gait recognition using Dual-branch Swin Transformer with Asymmetric Attention Fusion","authors":"Wentao He , Jianfeng Ren , Ruibin Bai , Xudong Jiang","doi":"10.1016/j.patcog.2024.111101","DOIUrl":"10.1016/j.patcog.2024.111101","url":null,"abstract":"<div><div>Video-based gait recognition suffers from potential privacy issues and performance degradation due to dim environments, partial occlusions, or camera view changes. Radar has recently become increasingly popular and overcome various challenges presented by vision sensors. To capture tiny differences in radar gait signatures of different people, a dual-branch Swin Transformer is proposed, where one branch captures the time variations of the radar micro-Doppler signature and the other captures the repetitive frequency patterns in the spectrogram. Unlike natural images where objects can be translated, rotated, or scaled, the spatial coordinates of spectrograms and CVDs have unique physical meanings, and there is no affine transformation for radar targets in these synthetic images. The patch splitting mechanism in Vision Transformer makes it ideal to extract discriminant information from patches, and learn the attentive information across patches, as each patch carries some unique physical properties of radar targets. Swin Transformer consists of a set of cascaded Swin blocks to extract semantic features from shallow to deep representations, further improving the classification performance. Lastly, to highlight the branch with larger discriminant power, an Asymmetric Attention Fusion is proposed to optimally fuse the discriminant features from the two branches. To enrich the research on radar gait recognition, a large-scale NTU-RGR dataset is constructed, containing 45,768 radar frames of 98 subjects. The proposed method is evaluated on the NTU-RGR dataset and the MMRGait-1.0 database. It consistently and significantly outperforms all the compared methods on both datasets. <em>The codes are available at:</em> <span><span>https://github.com/wentaoheunnc/NTU-RGR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111101"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model","authors":"Zhenghao Zhang , Shengfan Zhang , Zuozhuo Dai , Zilong Dong , Siyu Zhu","doi":"10.1016/j.patcog.2024.111100","DOIUrl":"10.1016/j.patcog.2024.111100","url":null,"abstract":"<div><div>The current state-of-the-art techniques for video object segmentation necessitate extensive training on video datasets with mask annotations, thereby constraining their ability to transfer zero-shot learning to new image distributions and tasks. However, recent advancements in foundation models, particularly in the domain of image segmentation, have showcased robust generalization capabilities, introducing a novel prompt-driven paradigm for a variety of downstream segmentation challenges on new data distributions. This study delves into the potential of vision foundation models using diverse prompt strategies and proposes a mask-free approach for unsupervised video object segmentation. To further improve the efficacy of prompt learning in diverse and complex video scenes, we introduce a spatial–temporal decoupled deformable attention mechanism to establish an effective correlation between intra- and inter-frame features. Extensive experiments conducted on the DAVIS2017-unsupervised and YoutubeVIS19&21 and OIVS datasets demonstrate the superior performance of the proposed approach without mask supervision when compared to existing mask-supervised methods, as well as its capacity to generalize to weakly-annotated video datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111100"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Jointly stochastic fully symmetric interpolatory rules and local approximation for scalable Gaussian process regression","authors":"Hongli Zhang, Jinglei Liu","doi":"10.1016/j.patcog.2024.111125","DOIUrl":"10.1016/j.patcog.2024.111125","url":null,"abstract":"<div><div>When exploring the broad application prospects of large-scale Gaussian process regression (GPR), three core challenges significantly constrain its full effectiveness: firstly, the <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>3</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> time complexity of computing the inverse covariance matrix of <span><math><mi>n</mi></math></span> training points becomes an insurmountable performance bottleneck when processing large-scale datasets; Secondly, although traditional local approximation methods are widely used, they are often limited by the inconsistency of prediction results; The third issue is that many aggregation strategies lack discrimination when evaluating the importance of experts (i.e. local models), resulting in a loss of overall prediction accuracy. In response to the above challenges, this article innovatively proposes a comprehensive method that integrates third-degree stochastic fully symmetric interpolatory rules (TDSFSI), local approximation, and Tsallis mutual information (TDSFSIRLA), aiming to fundamentally break through existing limitations. Specifically, TDSFSIRLA first introduces an efficient third-degree stochastic fully symmetric interpolatory rules, which achieves accurate approximation of Gaussian kernel functions by generating adaptive dimensional feature maps. This innovation not only significantly reduces the number of required orthogonal nodes and effectively lowers computational costs, but also maintains extremely high approximation accuracy, providing a solid theoretical foundation for processing large-scale datasets. Furthermore, in order to overcome the inconsistency of local approximation methods, this paper adopts the Generalized Robust Bayesian Committee Machine (GRBCM) as the aggregation framework for local experts. GRBCM ensures the harmonious unity of the prediction results of each local model through its inherent consistency and robustness, significantly improving the stability and reliability of the overall prediction. More importantly, in response to the issue of uneven distribution of expert weights, this article creatively introduces Tsallis mutual information as a metric for weight allocation. Tsallis mutual information, with its sensitive ability to capture information complexity, assigns weights to different local experts that match their contribution, effectively solving the problem of prediction bias caused by uneven weight distribution and further improving prediction accuracy. In the experimental verification phase, this article conducted comprehensive testing on multiple synthetic datasets and seven representative real datasets. The results show that the TDSFSIRLA method not only achieves significant reduction in time complexity, but also demonstrates excellent performance in prediction accuracy, fully verifying its significant advantages and broad application prospects in the field of large-scale Gaussi","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111125"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hong-Hanh Nguyen-Le , Lam Tran , Dinh Song An Nguyen , Nhien-An Le-Khac , Thuc Nguyen
{"title":"Privacy-preserving speaker verification system using Ranking-of-Element hashing","authors":"Hong-Hanh Nguyen-Le , Lam Tran , Dinh Song An Nguyen , Nhien-An Le-Khac , Thuc Nguyen","doi":"10.1016/j.patcog.2024.111107","DOIUrl":"10.1016/j.patcog.2024.111107","url":null,"abstract":"<div><div>The advancements in automatic speaker recognition have led to the exploration of voice data for verification systems. This raises concerns about the security of storing voice templates in plaintext. In this paper, we propose a novel cancellable biometrics that does not require users to manage random matrices or tokens. First, we pre-process the raw voice data and feed it into a deep feature extraction module to obtain embeddings. Next, we propose a hashing scheme, Ranking-of-Elements, which generates compact hashed codes by recording the number of elements whose values are lower than that of a random element. This approach captures more information from smaller-valued elements and prevents the adversary from guessing the ranking value through Attacks via Record Multiplicity. Lastly, we introduce a fuzzy matching method, to mitigate the variations in templates resulting from environmental noise. We evaluate the performance and security of our method on two datasets: TIMIT and VoxCeleb1.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111107"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HDR reconstruction from a single exposure LDR using texture and structure dual-stream generation","authors":"Yu-Hsiang Chen, Shanq-Jang Ruan","doi":"10.1016/j.patcog.2024.111127","DOIUrl":"10.1016/j.patcog.2024.111127","url":null,"abstract":"<div><div>Reconstructing high dynamic range (HDR) imagery from a single low dynamic range (LDR) photograph presents substantial challenges. The challenges are primarily due to the loss of details and information in regions of underexposure or overexposure due to quantization and saturation inherent to camera sensors. Traditional learning-based approaches often struggle with distinguishing overexposed regions within an object from the background, leading to compromised detail retention in these critical areas. Our methodology focuses on meticulously reconstructing structural and textural details to preserve the integrity of the structural information. We propose a new two-stage model architecture for HDR image reconstruction, including a dual-stream network and a feature fusion stage. The dual-stream network is designed to reconstruct structural and textural details, while the feature fusion stage aims to minimize artifacts by utilizing the reconstructed information. We have demonstrated that our proposed method performs better than other state-of-the-art single-image HDR reconstruction algorithms in various quality metrics.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111127"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Qian , Qijin Wang , Changxin Wu , Chao Wang , Long Cheng , Yating Hu , Hongqiang Wang
{"title":"Apply prior feature integration to sparse object detectors","authors":"Yu Qian , Qijin Wang , Changxin Wu , Chao Wang , Long Cheng , Yating Hu , Hongqiang Wang","doi":"10.1016/j.patcog.2024.111103","DOIUrl":"10.1016/j.patcog.2024.111103","url":null,"abstract":"<div><div>Noisy boxes as queries for sparse object detection has become a hot topic of research in recent years. Sparse R-CNN achieves one-to-one prediction from noisy boxes to object boxes, while DiffusionDet transforms the prediction process of Sparse R-CNN into multiple diffusion processes. Especially, algorithms such as Sparse R-CNN and its improved versions all rely on FPN to extract features for ROI Aligning. But the target only matching one feature map in FPN, which is inefficient and resource-consuming. otherwise, these methods like sparse object detection crop regions from noisy boxes for prediction, resulting in boxes failing to capture global features. In this work, we rethink the detection paradigm of sparse object detection and propose two improvements and produce a new object detector, called Prior Sparse R-CNN. Firstly, we replace the original FPN neck with a neck that only outputs one feature map to improve efficiency. Then, we design aggregated encoder after neck to solve the object scale problem through dilated residual blocks and feature aggregation. Another improvement is that we introduce prior knowledge for noisy boxes to enhance their understanding of global representations. Region Generation network (RGN) is designed by us to generate global object information and fuse it with the features of noisy boxes as prior knowledge. Prior Sparse R-CNN reaches the state-of-the-art 47.0 AP on COCO 2017 validation set, surpassing DiffusionDet by 1.5 AP with ResNet-50 backbone. Additionally, our training epoch requires only 3/5 of the time.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111103"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhize Wu , Yue Ding , Long Wan , Teng Li , Fudong Nian
{"title":"Local and global self-attention enhanced graph convolutional network for skeleton-based action recognition","authors":"Zhize Wu , Yue Ding , Long Wan , Teng Li , Fudong Nian","doi":"10.1016/j.patcog.2024.111106","DOIUrl":"10.1016/j.patcog.2024.111106","url":null,"abstract":"<div><div>The current successful paradigm for skeleton-based action recognition is the combination of Graph Convolutional Networks (GCNs) modeling spatial correlations, and Temporal Convolution Networks (TCNs), extracting motion features. Such GCN-TCN-based approaches usually rely on local graph convolution operations, which limits their ability to capture complicated correlations among distant joints, as well as represent long-range dependencies. Although the self-attention originated from Transformers shows great potential in correlation modeling of global joints, the Transformer-based methods are usually computationally expensive and ignore the physical connectivity structure of the human skeleton. To address these issues, we propose a novel Local-Global Self-Attention Enhanced Graph Convolutional Network (LG-SGNet) to simultaneously learn both local and global representations in the spatial–temporal dimension. Our approach consists of three components: The Local-Global Graph Convolutional Network (LG-GCN) module extracts local and global spatial feature representations by parallel channel-specific global and local spatial modeling. The Local-Global Temporal Convolutional Network (LG-TCN) module performs a joint-wise global temporal modeling using multi-head self-attention in parallel with local temporal modeling. This constitutes a new multi-branch temporal convolution structure that effectively captures both long-range dependencies and subtle temporal structures. Finally, the Dynamic Frame Weighting Module (DFWM) adjusts the weights of skeleton action sequence frames, allowing the model to adaptively focus on the features of representative frames for more efficient action recognition. Extensive experiments demonstrate that our LG-SGNet performs very competitively compared to the state-of-the-art methods. Our project website is available at <span><span>https://github.com/DingYyue/LG-SGNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111106"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}