{"title":"Topology reorganized graph contrastive learning with mitigating semantic drift","authors":"Jiaqiang Zhang, Songcan Chen","doi":"10.1016/j.patcog.2024.111160","DOIUrl":"10.1016/j.patcog.2024.111160","url":null,"abstract":"<div><div>Graph contrastive learning (GCL) is an effective paradigm for node representation learning in graphs. The key components hidden behind GCL are data augmentation and positive–negative pair selection. Typical data augmentations in GCL, such as uniform deletion of edges, are generally blind and resort to local perturbation, which is prone to producing under-diversity views. Additionally, there is a risk of making the augmented data traverse to other classes. Moreover, most methods always treat all other samples as negatives. Such a negative pairing naturally results in sampling bias and likewise may make the learned representation suffer from semantic drift. Therefore, to increase the diversity of the contrastive view, we propose two simple and effective global topological augmentations to compensate current GCL. One is to mine the semantic correlation between nodes in the feature space. The other is to utilize the algebraic properties of the adjacency matrix to characterize the topology by eigen-decomposition. With the help of both, we can retain important edges to build a better view. To reduce the risk of semantic drift, a prototype-based negative pair selection is further designed which can filter false negative samples. Extensive experiments on various tasks demonstrate the advantages of the model compared to the state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111160"},"PeriodicalIF":7.5,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cuican Yu , Fengxun Sun , Zihui Zhang , Huibin Li , Liming Chen , Jian Sun , Zongben Xu
{"title":"Adaptive representation learning and sample weighting for low-quality 3D face recognition","authors":"Cuican Yu , Fengxun Sun , Zihui Zhang , Huibin Li , Liming Chen , Jian Sun , Zongben Xu","doi":"10.1016/j.patcog.2024.111161","DOIUrl":"10.1016/j.patcog.2024.111161","url":null,"abstract":"<div><div>3D face recognition (3DFR) algorithms have advanced significantly in the past two decades by leveraging facial geometric information, but they mostly focus on high-quality 3D face scans, thus limiting their practicality in real-world scenarios. Recently, with the development of affordable consumer-level depth cameras, the focus has shifted towards low-quality 3D face scans. In this paper, we propose a method for low-quality 3DFR. On one hand, our approach employs the normalizing flow to model an adaptive-form distribution for any given 3D face scan. This adaptive distributional representation learning strategy allows for more robust representations of low-quality 3D face scans (which may be caused by the scan noises, pose or occlusion variations, etc.). On the other hand, we introduce an adaptive sample weighting strategy to adjust the importance of each training sample by measuring both the difficulty of being recognized and the data quality. This adaptive sample weighting strategy can further enhance the robustness of the deep model and meanwhile improve its performance on low-quality 3DFR. Through comprehensive experiments, we demonstrate that our method can significantly improve the performance of low-quality 3DFR. For example, our method achieves competitive results on both the IIIT-D database and the Lock3DFace datasets, underscoring its effectiveness in addressing the challenges associated with low-quality 3D faces.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111161"},"PeriodicalIF":7.5,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Corruption-based anomaly detection and interpretation in tabular data","authors":"Chunghyup Mok , Seoung Bum Kim","doi":"10.1016/j.patcog.2024.111149","DOIUrl":"10.1016/j.patcog.2024.111149","url":null,"abstract":"<div><div>Recent advances in self-supervised learning (SSL) have proven crucial in effectively learning representations of unstructured data, encompassing text, images, and audio. Although the applications of these advances in anomaly detection have been explored extensively, applying SSL to tabular data presents challenges because of the absence of prior information on data structure. In response, we propose a framework for anomaly detection in tabular datasets using variable corruption. Through selective variable corruption and assignment of new labels based on the degree of corruption, our framework can effectively distinguish between normal and abnormal data. Furthermore, analyzing the impact of corruption on anomaly scores aids in the identification of important variables. Experimental results obtained from various tabular datasets validate the precision and applicability of the proposed method. The source code can be accessed at <span><span>https://github.com/mokch/CAIT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111149"},"PeriodicalIF":7.5,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wentao He , Jianfeng Ren , Ruibin Bai , Xudong Jiang
{"title":"Two-stage Rule-induction visual reasoning on RPMs with an application to video prediction","authors":"Wentao He , Jianfeng Ren , Ruibin Bai , Xudong Jiang","doi":"10.1016/j.patcog.2024.111151","DOIUrl":"10.1016/j.patcog.2024.111151","url":null,"abstract":"<div><div>Raven’s Progressive Matrices (RPMs) are frequently used in evaluating human’s visual reasoning ability. Researchers have made considerable efforts in developing systems to automatically solve the RPM problem, often through a black-box end-to-end convolutional neural network for both visual recognition and logical reasoning tasks. Based on the intrinsic natures of RPM problem, we propose a Two-stage Rule-Induction Visual Reasoner (TRIVR), which consists of a perception module and a reasoning module, to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks, respectively. For the reasoning module, we further propose a “2+1” formulation that models human’s thinking in solving RPMs and significantly reduces the model complexity. It derives a reasoning rule from each RPM sample, which is not feasible for existing methods. As a result, the proposed reasoning module is capable of yielding a set of reasoning rules modeling human in solving the RPM problems. To validate the proposed method on real-world applications, an RPM-like Video Prediction (RVP) dataset is constructed, where visual reasoning is conducted on RPMs constructed using real-world video frames. Experimental results on various RPM-like datasets demonstrate that the proposed TRIVR achieves a significant and consistent performance gain compared with state-of-the-art models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111151"},"PeriodicalIF":7.5,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An effective multi-scale interactive fusion network with hybrid Transformer and CNN for smoke image segmentation","authors":"Kang Li , Feiniu Yuan , Chunmei Wang","doi":"10.1016/j.patcog.2024.111177","DOIUrl":"10.1016/j.patcog.2024.111177","url":null,"abstract":"<div><div>Smoke has visually elusive appearances, especially in low-light conditions, so it is quite difficult to quickly and accurately detect smoke from images. To address these challenges, we design a dual-encoder structure of Transformer and Convolutional Neural Network (CNN) to propose an effective Multi-scale Interactive Fusion Network (MIFNet) for smoke image segmentation. To improve the presentation of features, we propose a Local Feature Enhancement Propagation (LFEP) module to enhance spatial details. To optimize global and local features for efficient fusion, we integrate LFEP into the original Transformer to replace the traditional multi-head self-attention mechanism. Then, we propose a Multi-level Attention Coupled Module (MACM) to fuse Transformer and CNN features of the dual-encoder. MACM can flexibly focus on information interaction between different levels of two encoding paths. Finally, we design a Prior-guided Multi-scale Fusion Decoder (PMFD), which combines prior knowledge with a multi-scale feature fusion strategy to improve the performance of segmentation. Experimental results demonstrate that MIFNet substantially outperforms the state-of-the-art methods. MIFNet achieves a mean Intersection over Union (mIoU) of 81.6 % on the synthetic smoke (SYN70 K) dataset, and a remarkable accuracy of 98.3 % on the forest smoke dataset.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111177"},"PeriodicalIF":7.5,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Chen , Lin Zhang , Shengjie Zhao , Yicong Zhou
{"title":"Online indoor visual odometry with semantic assistance under implicit epipolar constraints","authors":"Yang Chen , Lin Zhang , Shengjie Zhao , Yicong Zhou","doi":"10.1016/j.patcog.2024.111150","DOIUrl":"10.1016/j.patcog.2024.111150","url":null,"abstract":"<div><div>Among solutions to the tasks of indoor localization and reconstruction, compared with traditional SLAM (Simultaneous Localization And Mapping), learning-based VO (Visual Odometry) has gained more and more popularity due to its robustness and low cost. However, the performance of existing indoor deep VOs is still limited in comparison with their outdoor counterparts mainly owing to large areas of textureless regions and complex indoor motions containing much more rotations. In this paper, the above two challenges are carefully tackled with the proposed SEOVO (Semantic Epipolar-constrained Online VO). On the one hand, as far as we know, SEOVO is the first semantic-aided VO under an online adaptive framework, which adaptively reconstructs low-texture planes without any supervision. On the other hand, we introduce the epipolar geometric constraint in an implicit way for improving the accuracy of pose estimation without destroying the global scale consistency. The efficiency and efficacy of SEOVO have been corroborated by extensive experiments conducted on both public datasets and our collected video sequences.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111150"},"PeriodicalIF":7.5,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DSCIMABNet: A novel multi-head attention depthwise separable CNN model for skin cancer detection","authors":"Hatice Catal Reis , Veysel Turk","doi":"10.1016/j.patcog.2024.111182","DOIUrl":"10.1016/j.patcog.2024.111182","url":null,"abstract":"<div><div>Skin cancer is a common type of cancer worldwide. Early diagnosis of skin cancer can reduce the risk of death by increasing treatment success. However, it is challenging for dermatologists or specialists because the symptoms are vague in the early stages and cannot be noticed by the naked eye. This study examines digital diagnostic techniques supported by artificial intelligence, focusing on early skin cancer detection and two methods have been proposed. In the first method, DSCIMABNet deep learning architecture was developed by combining multi-head attention and depthwise separable convolution techniques. This model provides flexibility in learning the dataset's local features, abstract concepts, and long-term relationships. The DSCIMABNet model and modern deep learning models trained on ImageNet are proposed to be combined with the ensemble learning method in the second method. This approach provides a comprehensive feature extraction process that will increase the performance of the classification process with ensemble learning. The proposed approaches are trained and evaluated on the ISIC 2018 dataset with image enhancement applied in preprocessing. In the experimental results, DSCIMABNet achieved 84.28% accuracy, while the proposed hybrid method achieved 99.40% accuracy. Moreover, on the Mendeley dataset (CNN for Melanoma Detection Data), DSCIMABNet achieved 92.58% accuracy, while the hybrid method achieved 99.37% accuracy. This study may significantly contribute to developing new and effective methods for the early diagnosis and treatment of skin cancer.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111182"},"PeriodicalIF":7.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PRSN: Prototype resynthesis network with cross-image semantic alignment for few-shot image classification","authors":"Mengping Dong , Fei Li , Zhenbo Li , Xue Liu","doi":"10.1016/j.patcog.2024.111122","DOIUrl":"10.1016/j.patcog.2024.111122","url":null,"abstract":"<div><div>Few-shot image classification aims to learn novel classes with limited labeled samples for each class. Recent research mainly focuses on reconstructing a query image from a support set. However, most methods overlook the nearest semantic base parts of support samples, leading to higher intra-class semantic variation. To address this issue, we propose a novel prototype resynthesis network (PRSN) for few-shot image classification that includes global-level and local-level branches. Firstly, the prototype is compounded from semantically similar base parts to enhance the representation. Then, the query set is used to reconstruct the prototypes, further reducing intra-class variations. Additionally, we design a cross-image semantic alignment to enforce global-level and local-level semantic consistency between different query images of the same class. Our empirical results demonstrate that PRSN achieves remarkable performance across a range of widely recognized benchmarks. For instance, our method outperforms the second-best by 0.69% under 5-way 1-shot settings with ResNet-12 backbone on the <em>mini</em>ImageNet dataset.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111122"},"PeriodicalIF":7.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Forget to Learn (F2L): Circumventing plasticity–stability trade-off in continuous unsupervised domain adaptation","authors":"Mohamed Abubakr Hassan, Chi-Guhn Lee","doi":"10.1016/j.patcog.2024.111139","DOIUrl":"10.1016/j.patcog.2024.111139","url":null,"abstract":"<div><div>In continuous unsupervised domain adaptation (CUDA), deep learning models struggle with the stability-plasticity trade-off—where the model must forget old knowledge to acquire new one. This paper introduces the “Forget to Learn” (F2L), a novel framework that circumvents such a trade-off. In contrast to state-of-the-art methods that aim to balance the two conflicting objectives, stability and plasticity, F2L utilizes active forgetting and knowledge distillation to circumvent the conflict’s root causes. In F2L, dual-encoders are trained, where the first encoder – the ‘Specialist’ – is designed to actively forget, thereby boosting adaptability (i.e., plasticity) and generating high-accuracy pseudo labels on the new domains. Such pseudo labels are then used to transfer/accumulate the specialist knowledge to the second encoder—the ‘Generalist’ through conflict-free knowledge distillation. Empirical and ablation studies confirmed F2L’s superiority on different datasets and against different SOTAs. Furthermore, F2L minimizes the need for hyperparameter tuning, enhances computational and sample efficiency, and excels in problems with long domain sequences—key advantages for practical systems constrained by hardware limitations.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111139"},"PeriodicalIF":7.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Highly realistic synthetic dataset for pixel-level DensePose estimation via diffusion model","authors":"Jiaxiao Wen, Tao Chu, Qiong Liu","doi":"10.1016/j.patcog.2024.111137","DOIUrl":"10.1016/j.patcog.2024.111137","url":null,"abstract":"<div><div>Generating training data with pixel-level annotations for DensePose is a labor-intensive task, resulting in sparse labeling in real-world datasets. Prior solutions have relied on specialized data generation systems to synthesize datasets. However, these synthetic datasets often lack realism and rely on expensive resources such as human body models and texture mappings. In this paper, we address these challenges by introducing a novel data generation method based on the diffusion model, effectively producing highly realistic data without the need for expensive resources. Specifically, our method comprises annotation generation and image generation. Utilizing graphic renderers and SMPL models, we produce synthetic annotations solely based on human poses and shapes. Subsequently, guided by these annotations, we employ simple yet effective textual prompts to generate a wide range of realistic images using the diffusion model. Our experiments conducted on DensePose-COCO dataset demonstrate the superiority of our method compared to existing methods. Code and benchmarks will be released.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111137"},"PeriodicalIF":7.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}