Ting-Ting Yuan , Qing-Ling Shu , Si-Bao Chen , Li-Li Huang, Bin Luo
{"title":"Instant pose extraction based on mask transformer for occluded person re-identification","authors":"Ting-Ting Yuan , Qing-Ling Shu , Si-Bao Chen , Li-Li Huang, Bin Luo","doi":"10.1016/j.patcog.2024.111082","DOIUrl":"10.1016/j.patcog.2024.111082","url":null,"abstract":"<div><div>Re-Identification (Re-ID) of obscured pedestrians is a daunting task, primarily due to the frequent occlusion caused by various obstacles like buildings, vehicles, and even other pedestrians. To address this challenge, we propose a novel approach named Instant Pose Extraction based on Mask Transformer (MTIPE), tailored specifically for occluded person Re-ID. MTIPE consists of several new modules: a Mask Aware Module (MAM) for alignment between the overall prototype and the occluded image; a Multi-headed Attention Constraint Module (MACM) to enrich the feature representation; a Pose Aggregation Module (PAM) to separate useful human information from the occlusion noise; a Feature Matching Module (FMM) in matching non-occluded parts; introduction of learnable local prototypes in the defined local prototype-based transformer decoder; a Pooling Attention Module (PAM) instead of traditional self-attention module to better extract and propagate local contextual information; and Pose Key-points Loss to better match non-occluded body parts. Through comprehensive experimental evaluations and comparisons, MTIPE demonstrates encouraging performance improvements in both occluded and holistic person Re-ID tasks. Its results surpass or at least match those of current state-of-the-art methods in various aspects, highlighting its potential advantages and promising application prospects.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111082"},"PeriodicalIF":7.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fine-grained Automatic Augmentation for handwritten character recognition","authors":"Wei Chen, Xiangdong Su, Hongxu Hou","doi":"10.1016/j.patcog.2024.111079","DOIUrl":"10.1016/j.patcog.2024.111079","url":null,"abstract":"<div><div>With the advancement of deep learning-based character recognition models, the training data size has become a crucial factor in improving the performance of handwritten text recognition. For languages with low-resource handwriting samples, data augmentation methods can effectively scale up the data size and improve the performance of handwriting recognition models. However, existing data augmentation methods for handwritten text face two limitations: (1) Methods based on global spatial transformations typically augment the training data by transforming each word sample as a whole but ignore the potential to generate fine-grained transformation from local word areas, limiting the diversity of the generated samples; (2) It is challenging to adaptively choose a reasonable augmentation parameter when applying these methods to different language datasets. To address these issues, this paper proposes Fine-grained Automatic Augmentation (FgAA) for handwritten character recognition. Specifically, FgAA views each word sample as composed of multiple strokes and achieves data augmentation by performing fine-grained transformations on the strokes. Each word is automatically segmented into various strokes, and each stroke is fitted with a Bézier curve. On such a basis, we define the augmentation policy related to the fine-grained transformation and use Bayesian optimization to select the optimal augmentation policy automatically, thereby achieving the automatic augmentation of handwriting samples. Experiments on seven handwriting datasets of different languages demonstrate that FgAA achieves the best augmentation effect for handwritten character recognition. Our code is available at <span><span>https://github.com/IMU-MachineLearningSXD/Fine-grained-Automatic-Augmentation</span><svg><path></path></svg></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111079"},"PeriodicalIF":7.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142553017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Zhang , Li Xu , Ke-Hao Liu , Ru Yang , Mao-Zhen Li , Xiao-Yang Guo
{"title":"Piecewise convolutional neural network relation extraction with self-attention mechanism","authors":"Bo Zhang , Li Xu , Ke-Hao Liu , Ru Yang , Mao-Zhen Li , Xiao-Yang Guo","doi":"10.1016/j.patcog.2024.111083","DOIUrl":"10.1016/j.patcog.2024.111083","url":null,"abstract":"<div><div>The task of relation extraction in natural language processing is to identify the relation between two specified entities in a sentence. However, the existing model methods do not fully utilize the word feature information and pay little attention to the influence degree of the relative relation extraction results of each word. In order to address the aforementioned issues, we propose a relation extraction method based on self-attention mechanism (SPCNN-VAE) to solve the above problems. First, we use a multi-head self-attention mechanism to process word vectors and generate sentence feature vector representations, which can be used to extract semantic dependencies between words in sentences. Then, we introduce the word position to combine the sentence feature representation with the position feature representation of words to form the input representation of piecewise convolutional neural network (PCNN). Furthermore, to identify the word feature information that is most useful for relation extraction, an attention-based pooling operation is employed to capture key convolutional features and classify the feature vectors. Finally, regularization is performed by a variational autoencoder (VAE) to enhance the encoding ability of model word information features. The performance analysis is performed on SemEval 2010 task 8, and the experimental results show that the proposed relation extraction model is effective and outperforms some competitive baselines.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111083"},"PeriodicalIF":7.5,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiancong Xiao , Liusha Yang , Yanbo Fan , Jue Wang , Zhi-Quan Luo
{"title":"Understanding adversarial robustness against on-manifold adversarial examples","authors":"Jiancong Xiao , Liusha Yang , Yanbo Fan , Jue Wang , Zhi-Quan Luo","doi":"10.1016/j.patcog.2024.111071","DOIUrl":"10.1016/j.patcog.2024.111071","url":null,"abstract":"<div><div>Deep neural networks (DNNs) are shown to be vulnerable to adversarial examples. A well-trained model can be easily attacked by adding small perturbations to the original data. One of the hypotheses of the existence of the adversarial examples is the off-manifold assumption: adversarial examples lie off the data manifold. However, recent researches showed that on-manifold adversarial examples also exist. In this paper, we revisit the off-manifold assumption and study a question: at what level is the poor adversarial robustness of neural networks due to on-manifold adversarial examples? Since the true data manifold is unknown in practice, we consider two approximated on-manifold adversarial examples on both real and synthesis datasets. On real datasets, we show that on-manifold adversarial examples have greater attack rates than off-manifold adversarial examples on both standard-trained and adversarially-trained models. On synthetic datasets, theoretically, we prove that on-manifold adversarial examples are powerful, yet adversarial training focuses on off-manifold directions and ignores the on-manifold adversarial examples. Furthermore, we provide analysis to show that the properties derived theoretically can also be observed in practice. Our analysis suggests that on-manifold adversarial examples are important. We should pay more attention to on-manifold adversarial examples to train robust models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111071"},"PeriodicalIF":7.5,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-supervised learning from images: No negative pairs, no cluster-balancing","authors":"Jian-Ping Mei, Shixiang Wang, Miaoqi Yu","doi":"10.1016/j.patcog.2024.111081","DOIUrl":"10.1016/j.patcog.2024.111081","url":null,"abstract":"<div><div>Learning with self-derived targets provides a non-contrastive method for unsupervised image representation learning, where the variety in targets is crucial. Recent work has achieved good performance by learning with targets obtained via cluster-balancing. However, the equal-cluster-size constraint becomes too restrictive for handling data with imbalanced categories or coming in small batches. In this paper, we propose a new clustering-based approach for non-contrastive image representation learning with no need for a particular architecture design or extra memory bank and no explicit constraints on cluster size. A key formulation is to learn embedding consistency and variable decorrelation in the cluster space by tweaking the batch-wise cross-correlation matrix towards an identity one. With this identitization loss incorporated, predicted cluster assignments of two randomly augmented views of the same image serve as targets for each other. We carried out comprehensive experimental studies of linear classification with learned representations of benchmark image datasets. Our results show that the proposed approach significantly outperforms state-of-the-art approaches and is more robust to class imbalance than those with cluster balancing.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111081"},"PeriodicalIF":7.5,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaomin Liu , Qiqi Li , Yuzhe Hu , Jeng-Shyang Pan , Huaqi Zhao , Donghua Yuan , Jun-Bao Li
{"title":"Feature-matching method based on keypoint response constraint using binary encoding of phase congruency","authors":"Xiaomin Liu , Qiqi Li , Yuzhe Hu , Jeng-Shyang Pan , Huaqi Zhao , Donghua Yuan , Jun-Bao Li","doi":"10.1016/j.patcog.2024.111078","DOIUrl":"10.1016/j.patcog.2024.111078","url":null,"abstract":"<div><div>At present, the cross-view geo-localization (CGL) task is still far from practical. This is mainly because of the intensity differences between the two images from different sensors. In this study, we propose a learning feature-matching framework with binary encoding of phase congruency to solve the problem of intensity differences between the two images. First, the autoencoder-weighted fusion method is used to obtain an intensity alignment image that would make the two images from different sensors comparable. Second, the keypoint responses of the two images are calculated using the binary encoding of the phase congruency theory, which is employed to construct the feature-matching method. This method considers the invariance of the phase information in weak-texture images and uses the phase information to compute the keypoint response with higher distinguishability and matchability. Finally, using the two intensity-aligned images, a method for computing the binary encoding of the phase congruency keypoint response loss function is employed to optimize the keypoint detector and feature descriptor and obtain the corresponding keypoint set of the two images. The experimental results show that the improved feature matching is superior to existing methods and solves the problem of view differences in object matching. The code can be found at <span><span>https://github.com/lqq-dot/FMPCKR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111078"},"PeriodicalIF":7.5,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lintao Xu , Changhui Hu , Yin Hu , Xiaoyuan Jing , Ziyun Cai , Xiaobo Lu
{"title":"UPT-Flow: Multi-scale transformer-guided normalizing flow for low-light image enhancement","authors":"Lintao Xu , Changhui Hu , Yin Hu , Xiaoyuan Jing , Ziyun Cai , Xiaobo Lu","doi":"10.1016/j.patcog.2024.111076","DOIUrl":"10.1016/j.patcog.2024.111076","url":null,"abstract":"<div><div>Low-light images often suffer from information loss and RGB value degradation due to extremely low or nonuniform lighting conditions. Many existing methods primarily focus on optimizing the appearance distance between the enhanced image and the normal-light image, while neglecting the explicit modeling of information loss regions or incorrect information points in low-light images. To address this, this paper proposes an Unbalanced Points-guided multi-scale Transformer-based conditional normalizing Flow (UPT-Flow) for low-light image enhancement. We design an unbalanced point map prior based on the differences in the proportion of RGB values for each pixel in the image, which is used to modify traditional self-attention and mitigate the negative effects of areas with information distortion in the attention calculation. The Multi-Scale Transformer (MSFormer) is composed of several global-local transformer blocks, which encode rich global contextual information and local fine-grained details for conditional normalizing flow. In the invertible network of flow, we design cross-coupling conditional affine layers based on channel and spatial attention, enhancing the expressive power of a single flow step. Without bells and whistles, extensive experiments on low-light image enhancement, night traffic monitoring enhancement, low-light object detection, and nighttime image segmentation have demonstrated that our proposed method achieves state-of-the-art performance across a variety of real-world scenes. The code and pre-trained models will be available at <span><span>https://github.com/NJUPT-IPR-XuLintao/UPT-Flow</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111076"},"PeriodicalIF":7.5,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Zhang , Ziyi Li , Chufeng Tang , Jianmin Li , Xiaolin Hu
{"title":"CEDNet: A cascade encoder–decoder network for dense prediction","authors":"Gang Zhang , Ziyi Li , Chufeng Tang , Jianmin Li , Xiaolin Hu","doi":"10.1016/j.patcog.2024.111072","DOIUrl":"10.1016/j.patcog.2024.111072","url":null,"abstract":"<div><div>The prevailing methods for dense prediction tasks typically utilize a heavy classification backbone to extract multi-scale features and then fuse these features using a lightweight module. However, these methods allocate most computational resources to the classification backbone, which delays the multi-scale feature fusion and potentially leads to inadequate feature fusion. Although some methods perform feature fusion from early stages, they either fail to fully leverage high-level features to guide low-level feature learning or have complex structures, resulting in sub-optimal performance. We propose a streamlined cascade encoder–decoder network, named CEDNet, tailored for dense prediction tasks. All stages in CEDNet share the same encoder–decoder structure and perform multi-scale feature fusion within each decoder, thereby enhancing the effectiveness of multi-scale feature fusion. We explored three well-known encoder–decoder structures: Hourglass, UNet, and FPN, all of which yielded promising results. Experiments on various dense prediction tasks demonstrated the effectiveness of our method.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111072"},"PeriodicalIF":7.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuning Cui , Qiang Wang , Chaopeng Li , Wenqi Ren , Alois Knoll
{"title":"EENet: An effective and efficient network for single image dehazing","authors":"Yuning Cui , Qiang Wang , Chaopeng Li , Wenqi Ren , Alois Knoll","doi":"10.1016/j.patcog.2024.111074","DOIUrl":"10.1016/j.patcog.2024.111074","url":null,"abstract":"<div><div>While numerous solutions leveraging convolutional neural networks and Transformers have been proposed for image dehazing, there remains significant potential to improve the balance between efficiency and reconstruction performance. In this paper, we introduce an efficient and effective network named EENet, designed for image dehazing through enhanced spatial–spectral learning. EENet comprises three primary modules: the frequency processing module, the spatial processing module, and the dual-domain interaction module. Specifically, the frequency processing module handles Fourier components individually based on their distinct properties for image dehazing while also modeling global dependencies according to the convolution theorem. Additionally, the spatial processing module is designed to enable multi-scale learning. Finally, the dual-domain interaction module promotes information exchange between the frequency and spatial domains. Extensive experiments demonstrate that EENet achieves state-of-the-art performance on seven synthetic and real-world datasets for image dehazing. Moreover, the network’s generalization ability is validated by extending it to image desnowing, image defocus deblurring, and low-light image enhancement.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111074"},"PeriodicalIF":7.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}