Jiaqi Fan , Enming Zhang , Ying Wei , Yuefeng Wang , Jiakun Xia , Junwei Liu , Xinghong Liu , Shuailei Ma
{"title":"DDOWOD: DiffusionDet for open-world object detection","authors":"Jiaqi Fan , Enming Zhang , Ying Wei , Yuefeng Wang , Jiakun Xia , Junwei Liu , Xinghong Liu , Shuailei Ma","doi":"10.1016/j.patrec.2024.10.002","DOIUrl":"10.1016/j.patrec.2024.10.002","url":null,"abstract":"<div><div>Open-world object detection (OWOD) poses a significant challenge in computer vision, requiring models to detect unknown objects and incrementally learn new categories. To explore this field, we propose the DDOWOD based on the DiffusionDet. It is more likely to cover unknown objects hidden in the background and can reduce the model’s bias towards known class objects during training due to its ability to randomly generate boxes and reconstruct the characteristics of the GT from them. Also, to improve the insufficient quality of pseudo-labels which leads to reduced accuracy in recognizing unknown classes, we use the Segment Anything Model (SAM) as the teacher model in distillation learning to endow DDOWOD with rich visual knowledge. Surprisingly, compared to other existing models, our DDOWOD is more suitable for using SAM as the teacher. Furthermore, we proposed the Stepwise distillation (SD) which is a new incremental learning method specialized for our DDOWOD to avoid catastrophic forgetting during the training. Our approach utilizes all previously trained models from past tasks rather than solely relying on the last one. DDOWOD has achieved excellent performance. U-Recall is 53.2, 51.5, 50.7 in OWOD split and U-AP is 21.9 in IntensiveSet.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 170-177"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deng Li , Jianguang Zhang , Kunhong Wu , Yucheng Shi , Yahong Han
{"title":"Pseudo-label refinement via hierarchical contrastive learning for source-free unsupervised domain adaptation","authors":"Deng Li , Jianguang Zhang , Kunhong Wu , Yucheng Shi , Yahong Han","doi":"10.1016/j.patrec.2024.10.006","DOIUrl":"10.1016/j.patrec.2024.10.006","url":null,"abstract":"<div><div>Source-free unsupervised domain adaptation aims to adapt a source model to an unlabeled target domain without accessing the source data due to privacy considerations. Existing works mainly solve the problem by self-training methods and representation learning. However, these works typically learn the representation on a single semantic level and barely exploit the rich hierarchical semantic information to obtain clear decision boundaries, which makes it hard for these methods to achieve satisfactory generalization performance. In this paper, we propose a novel hierarchical contrastive domain adaptation algorithm that exploits self-supervised contrastive learning on both fine-grained instances and coarse-grained cluster semantics. On the one hand, we propose an adaptive prototype pseudo-labeling strategy to obtain much more reliable labels. On the other hand, we propose hierarchical contrastive representation learning on both fine-grained instance-wise level and coarse-grained cluster level to reduce the negative effect of label noise and stabilize the whole training procedure. Extensive experiments are conducted on primary unsupervised domain adaptation benchmark datasets, and the results demonstrate the effectiveness of the proposed method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 236-242"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmed Abdelkawy , Aly Farag , Islam Alkabbany , Asem Ali , Chris Foreman , Thomas Tretter , Nicholas Hindy
{"title":"Measuring student behavioral engagement using histogram of actions","authors":"Ahmed Abdelkawy , Aly Farag , Islam Alkabbany , Asem Ali , Chris Foreman , Thomas Tretter , Nicholas Hindy","doi":"10.1016/j.patrec.2024.11.002","DOIUrl":"10.1016/j.patrec.2024.11.002","url":null,"abstract":"<div><div>In this work, we propose a novel method for assessing students’ behavioral engagement by representing student’s actions and their frequencies over an arbitrary time interval as a histogram of actions. This histogram and the student’s gaze are utilized as input to a classifier that determines whether the student is engaged or not. For action recognition, we use students’ skeletons to model their postures and upper body movements. To learn the dynamics of a student’s upper body, a 3D-CNN model is developed. The trained 3D-CNN model recognizes actions within every 2-minute video segment then these actions are used to build the histogram of actions. To evaluate the proposed framework, we build a dataset consisting of 1414 video segments annotated with 13 actions and 963 2-minute video segments annotated with two engagement levels. Experimental results indicate that student actions can be recognized with top-1 accuracy 86.32% and the proposed framework can capture the average engagement of the class with a 90% F1-score.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 337-344"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Zhang , Yiming Wang , Hongyu Chen , Taihao Li , Shupeng Liu , Xianfeng Gu , Xiaoyin Xu
{"title":"Design of a differentiable L-1 norm for pattern recognition and machine learning","authors":"Min Zhang , Yiming Wang , Hongyu Chen , Taihao Li , Shupeng Liu , Xianfeng Gu , Xiaoyin Xu","doi":"10.1016/j.patrec.2024.09.020","DOIUrl":"10.1016/j.patrec.2024.09.020","url":null,"abstract":"<div><div>In various applications of pattern recognition, feature selection, and machine learning, L-1 norm is used as either an objective function or a regularizer. Mathematically, L-1 norm has unique characteristics that make it attractive in machine learning, feature selection, optimization, and regression. Computationally, however, L-1 norm presents a hurdle as it is non-differentiable, making the process of finding a solution difficult. Existing approach therefore relies on numerical approaches. In this work we designed an L-1 norm that is differentiable and, thus, has an analytical solution. The differentiable L-1 norm removes the absolute sign in the conventional definition and is everywhere differentiable. The new L-1 norm is almost everywhere linear, a desirable feature that is also present in the conventional L-1 norm. The only limitation of the new L-1 norm is that near zero, its behavior is not linear, hence we consider the new L-1 norm quasi-linear. Being differentiable, the new L-1 norm and its quasi-linear variation make them amenable to analytic solutions. Hence, it can facilitate the development and implementation of many algorithms involving L-1 norm. Our tests validate the capability of the new L-1 norm in various applications.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 126-132"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online probabilistic knowledge distillation on cryptocurrency trading using Deep Reinforcement Learning","authors":"Vasileios Moustakidis , Nikolaos Passalis , Anastasios Tefas","doi":"10.1016/j.patrec.2024.10.005","DOIUrl":"10.1016/j.patrec.2024.10.005","url":null,"abstract":"<div><div>Leveraging Deep Reinforcement Learning (DRL) for training agents for financial trading has gained significant attention in recent years. However, training these agents in noisy financial environments remains challenging and unstable, significantly impacting their performance as trading agents, as the recent literature has also showcased. This paper introduces a novel distillation method for DRL agents, aiming to improve the training stability of DRL agents. The proposed method transfers knowledge from a teacher ensemble to a student model, incorporating both the action probability distribution knowledge from the output layer, as well as the knowledge from the intermediate layers of the teacher’s network. Furthermore, the proposed method also works in an online fashion, allowing for eliminating the separate teacher training process typically involved in many DRL distillation pipelines, simplifying the distillation process. The proposed method is extensively evaluated on a large-scale cryptocurrency trading setup, demonstrating its ability to both lead to significant improvements in trading accuracy and obtained profit, as well as increase the stability of the training process.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 243-249"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explainable hypergraphs for gait based Parkinson classification","authors":"Anirban Dutta Choudhury , Ananda S. Chowdhury","doi":"10.1016/j.patrec.2024.09.026","DOIUrl":"10.1016/j.patrec.2024.09.026","url":null,"abstract":"<div><div>Parkinson Disease (PD) classification using Vertical Ground Reaction Force (VGRF) sensors can help in unobtrusive detection and monitoring of PD patients. State-of-the-art (SOTA) research in PD classification reveals that Deep Learning (DL), at the expense of explainability, performs better than Shallow Learning (SL). In this paper, we introduce a novel explainable weighted hypergraph, where the interconnections of the SOTA features are exploited, leading to more discriminative derived features, and thereby, forming an SL arm. In parallel, we create a DL arm consisting of ResNet architecture to learn the spatio-temporal patterns of the VGRF signals. Probabilities of PD classification scores from the SL and the DL arms are adaptively fused to create a hybrid pipeline. The pipeline achieves an AUC value of 0.979 on the Physionet Parkinson Dataset. This AUC value is found to be superior to the SL as well as the DL arm used in isolation, yielding respective AUCs of 0.878 and 0.852. The proposed pipeline demonstrates explainability through improved permutation feature importance and contrasting examples of use cases, where incorrect misclassification of the DL arm gets rectified by the SL arm and vice versa. We further demonstrate that our solution achieves comparable performance with SOTA methods. To the best of our knowledge, this is the first approach to analyze PD classification with a hypergraph based xAI (Explainable Artificial Intelligence).</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 1-7"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive feature alignment for adversarial training","authors":"Kai Zhao , Tao Wang , Ruixin Zhang , Wei Shen","doi":"10.1016/j.patrec.2024.10.004","DOIUrl":"10.1016/j.patrec.2024.10.004","url":null,"abstract":"<div><div>Recent studies reveal that Convolutional Neural Networks (CNNs) are typically vulnerable to adversarial attacks. Many adversarial defense methods have been proposed to improve the robustness against adversarial samples. Moreover, these methods can only defend adversarial samples of a specific strength, reducing their flexibility against attacks of varying strengths. Moreover, these methods often enhance adversarial robustness at the expense of accuracy on clean samples. In this paper, we first observed that features of adversarial images change monotonically and smoothly w.r.t the rising of attacking strength. This intriguing observation suggests that features of adversarial images with various attacking strengths can be approximated by interpolating between the features of adversarial images with the strongest and weakest attacking strengths. Due to the monotonicity property, the interpolation weight can be easily learned by a neural network. Based on the observation, we proposed the adaptive feature alignment (AFA) that automatically align features to defense adversarial attacks of various attacking strengths. During training, our method learns the statistical information of adversarial samples with various attacking strengths using a dual batchnorm architecture. In this architecture, each batchnorm process handles samples of a specific attacking strength. During inference, our method automatically adjusts to varying attacking strengths by linearly interpolating the dual-BN features. Unlike previous methods that need to either retrain the model or manually tune hyper-parameters for a new attacking strength, our method can deal with arbitrary attacking strengths with a single model without introducing any hyper-parameter. Additionally, our method improves the model robustness against adversarial samples without incurring much loss of accuracy on clean images. Experiments on CIFAR-10, SVHN and tiny-ImageNet datasets demonstrate that our method outperforms the state-of-the-art under various attacking strengths and even improve accuracy on clean samples. Code will be made open available upon acceptance.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 184-190"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discrete diffusion models with Refined Language-Image Pre-trained representations for remote sensing image captioning","authors":"Guannan Leng , Yu-Jie Xiong , Chunping Qiu , Congzhou Guo","doi":"10.1016/j.patrec.2024.09.019","DOIUrl":"10.1016/j.patrec.2024.09.019","url":null,"abstract":"<div><div>RS image captioning (RSIC) utilizes natural language to provide a description of image content, assisting in the comprehension of object properties and relationships. Nonetheless, RS images are characterized by variations in object scales, distributions, and quantities, which make it challenging to obtain global semantic information and object connections. To enhance the accuracy of captions produced from RS images, this paper proposes a novel method referred to as Discrete Diffusion Models with Refined Language-Image Pre-trained representations (DDM-RLIP), leveraging an advanced discrete diffusion model (DDM) for nosing and denoising text tokens. DDM-RLIP is based on an advanced DDM-based method designed for natural pictures. The primary approach for refining image representations involves fine-tuning a CLIP image encoder on RS images, followed by adapting the transformer with an additional attention module to focus on crucial image regions and relevant words. Furthermore, experiments were conducted on three datasets, Sydney-Captions, UCM-Captions, and NWPU-Captions, and the results demonstrated the superior performance of the proposed method compared to conventional autoregressive models. On the NWPU-Captions dataset, the CIDEr score improved from 116.4 to 197.7, further validating the efficacy and potential of DDM-RLIP. The implementation codes for our approach DDM-RLIP are available at <span><span>https://github.com/Leng-bingo/DDM-RLIP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 164-169"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhijie Wang , Masanori Suganuma , Takayuki Okatani
{"title":"Rethinking unsupervised domain adaptation for semantic segmentation","authors":"Zhijie Wang , Masanori Suganuma , Takayuki Okatani","doi":"10.1016/j.patrec.2024.09.022","DOIUrl":"10.1016/j.patrec.2024.09.022","url":null,"abstract":"<div><div>Unsupervised domain adaptation (UDA) adapts a model trained on one domain (called source) to a novel domain (called target) using only unlabeled data. Due to its high annotation cost, researchers have developed many UDA methods for semantic segmentation, which assume no labeled sample is available in the target domain. We question the practicality of this assumption for two reasons. First, after training a model with a UDA method, we must somehow verify the model before deployment. Second, UDA methods have at least a few hyper-parameters that need to be determined. The surest solution to these is to evaluate the model using validation data, i.e., a certain amount of labeled target-domain samples. This question about the basic assumption of UDA leads us to rethink UDA from a data-centric point of view. Specifically, we assume we have access to a minimum level of labeled data. Then, we ask how much is necessary to find good hyper-parameters of existing UDA methods. We then consider what if we use the same data for supervised training of the same model, e.g., finetuning. We conducted experiments to answer these questions with popular scenarios, {GTA5, SYNTHIA}<span><math><mo>→</mo></math></span>Cityscapes. We found that i) choosing good hyper-parameters needs only a few labeled images for some UDA methods whereas a lot more for others; and ii) simple finetuning works surprisingly well; it outperforms many UDA methods if only several dozens of labeled images are available.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 119-125"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Motion-guided small MAV detection in complex and non-planar scenes","authors":"Hanqing Guo , Canlun Zheng , Shiyu Zhao","doi":"10.1016/j.patrec.2024.09.013","DOIUrl":"10.1016/j.patrec.2024.09.013","url":null,"abstract":"<div><div>In recent years, there has been a growing interest in the visual detection of micro aerial vehicles (MAVs) due to its importance in numerous applications. However, the existing methods based on either appearance or motion features encounter difficulties when the background is complex or the MAV is too small. In this paper, we propose a novel motion-guided MAV detector that can accurately identify small MAVs in complex and non-planar scenes. This detector first exploits a motion feature enhancement module to capture the motion features of small MAVs. Then it uses multi-object tracking and trajectory filtering to eliminate false positives caused by motion parallax. Finally, an appearance-based classifier and an appearance-based detector that operates on the cropped regions are used to achieve precise detection results. Our proposed method can effectively and efficiently detect extremely small MAVs from dynamic and complex backgrounds because it aggregates pixel-level motion features and eliminates false positives based on the motion and appearance features of MAVs. Experiments on the ARD-MAV dataset demonstrate that the proposed method could achieve high performance in small MAV detection under challenging conditions and outperform other state-of-the-art methods across various metrics.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 98-105"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}