{"title":"A novel HMM distance measure with state alignment","authors":"Nan Yang , Cheuk Hang Leung , Xing Yan","doi":"10.1016/j.patrec.2024.10.018","DOIUrl":"10.1016/j.patrec.2024.10.018","url":null,"abstract":"<div><div>In this paper, we introduce a novel distance measure that conforms to the definition of a semi-distance, for quantifying the similarity between Hidden Markov Models (HMMs). This distance measure is not only easier to implement, but also accounts for state alignment before distance calculation, ensuring correctness and accuracy. Our proposed distance measure presents a significant advancement in HMM comparison, offering a more practical and accurate solution compared to existing measures. Numerical examples that demonstrate the utility of the proposed distance measure are given for HMMs with continuous state probability densities. In real-world data experiments, we employ HMM to represent the evolution of financial time series or music. Subsequently, leveraging the proposed distance measure, we conduct HMM-based unsupervised clustering, demonstrating promising results. Our approach proves effective in capturing the inherent difference in dynamics of financial time series, showcasing the practicality and success of the proposed distance measure.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 314-321"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junghyun Seo , Sungjun Wang , Hyeonjae Jeon , Taesoo Kim , Yongsik Jin , Soon Kwon , Jeseok Kim , Yongseob Lim
{"title":"LuminanceGAN: Controlling the brightness of generated images for various night conditions","authors":"Junghyun Seo , Sungjun Wang , Hyeonjae Jeon , Taesoo Kim , Yongsik Jin , Soon Kwon , Jeseok Kim , Yongseob Lim","doi":"10.1016/j.patrec.2024.10.014","DOIUrl":"10.1016/j.patrec.2024.10.014","url":null,"abstract":"<div><div>There are diverse datasets available for training deep learning models utilized in autonomous driving. However, most of these datasets are composed of images obtained in day conditions, leading to a data imbalance issue when dealing with night condition images. Several day-to-night image translation models have been proposed to resolve the insufficiency of the night condition dataset, but these models often generate artifacts and cannot control the brightness of the generated image. In this study, we propose a LuminanceGAN, for controlling the brightness degree in night conditions to generate realistic night image outputs. The proposed novel Y-control loss converges the brightness degree of the output image to a specific luminance value. Furthermore, the implementation of the self-attention module effectively reduces artifacts in the generated images. Consequently, in qualitative comparisons, our model demonstrates superior performance in day-to-night image translation. Additionally, a quantitative evaluation was conducted using lane detection models, showing that our proposed method improves performance in night lane detection tasks. Moreover, the quality of the generated indoor dark images was assessed using an evaluation metric. It can be proven that our model generates images most similar to real dark images compared to other image translation models.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 292-299"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jongmin Yu , Hyeontaek Oh , Younkwan Lee , Jinhong Yang
{"title":"Denoising diffusion model with adversarial learning for unsupervised anomaly detection on brain MRI images","authors":"Jongmin Yu , Hyeontaek Oh , Younkwan Lee , Jinhong Yang","doi":"10.1016/j.patrec.2024.10.007","DOIUrl":"10.1016/j.patrec.2024.10.007","url":null,"abstract":"<div><div>This paper proposes the Adversarial Denoising Diffusion Model (ADDM). Diffusion models excel at generating high-quality samples, outperforming other generative models. These models also achieve outstanding medical image anomaly detection (AD) results due to their strong sampling ability. However, the performance of the diffusion model-based methods is highly varied depending on the sampling frequency, and the time cost to generate good-quality samples is significantly higher than that of other generative models. We propose the ADDM, a diffusion model-based AD method trained with adversarial learning that can maintain high-quality sample generation ability and significantly reduce the number of sampling steps. The proposed adversarial learning is achieved by classifying model-based denoised samples and samples to which random Gaussian noise is added to a specific sampling step. Compared with the loss function of diffusion models, defined under the noise space to minimise the predicted noise and scheduled noise, the diffusion model can explicitly learn semantic information about the sample space since adversarial learning is defined based on the sample space. Our experiment demonstrated that adversarial learning helps achieve a data sampling performance similar to the DDPM with much fewer sampling steps. Experimental results show that the proposed ADDM outperformed existing unsupervised AD methods on Brain MRI images. In particular, in the comparison using 22 T1-weighted MRI scans provided by the Centre for Clinical Brain Sciences from the University of Edinburgh, the ADDM achieves similar performance with 50% fewer sampling steps than other DDPM-based AD methods, and it shows 6.2% better performance about the Dice metric with the same number of sampling steps.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 229-235"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ting-Ruen Wei , Yuan Wang , Yoshitaka Inoue , Hsin-Tai Wu , Yi Fang
{"title":"Table Transformers for imputing textual attributes","authors":"Ting-Ruen Wei , Yuan Wang , Yoshitaka Inoue , Hsin-Tai Wu , Yi Fang","doi":"10.1016/j.patrec.2024.09.023","DOIUrl":"10.1016/j.patrec.2024.09.023","url":null,"abstract":"<div><div>Missing data in tabular dataset is a common issue as the performance of downstream tasks usually depends on the completeness of the training dataset. Previous missing data imputation methods focus on numeric and categorical columns, but we propose a novel end-to-end approach called Table Transformers for Imputing Textual Attributes (TTITA) based on the transformer to impute unstructured textual columns using other columns in the table. We conduct extensive experiments on three datasets, and our approach shows competitive performance outperforming baseline models such as recurrent neural networks and Llama2. The performance improvement is more significant when the target sequence has a longer length. Additionally, we incorporate multi-task learning to simultaneously impute for heterogeneous columns, boosting the performance for text imputation. We also qualitatively compare with ChatGPT for realistic applications.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 258-264"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142551873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DBCvT: Double Branch Convolutional Transformer for Medical Image Classification","authors":"Jinfeng Li , Meiling Feng , Chengyi Xia","doi":"10.1016/j.patrec.2024.10.008","DOIUrl":"10.1016/j.patrec.2024.10.008","url":null,"abstract":"<div><div>Convolutional Neural Networks (CNNs) are extensively utilized in medical disease diagnosis, demonstrating the prominent performance in most cases. However, medical image processing based on deep learning faces some challenges. The limited availability and time-consuming annotations of medical image data restrict the scale and accuracy of model training. Data diversity and complexity further complicate these challenges. In order to address these issues, we introduce the Double Branch Convolutional Transformer (DBCvT), a hybrid CNN-Transformer feature extractor, which can better capture diverse fine-grained features and remain suitable for small datasets. In this model, separable downsampling convolution (SDConv) is used to mitigate excessive information loss during downsampling in standard convolutions. Additionally, we propose the Dual branch Channel Efficient multi-head Self-Attention (DCESA) mechanism to enhance the self-attention efficiency, consequently elevating network performance and effectiveness. Moreover, we introduce a novel convolutional channel-enhanced Attention mechanism to strengthen inter-channel relationships within feature maps post self-attention. The experiments of DBCvT on various medical image datasets have demonstrated the outstanding classification performance and generalization capability of the proposed model.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 250-257"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regional dynamic point cloud completion network","authors":"Liping Zhu, Yixuan Yang, Kai Liu, Silin Wu, Bingyao Wang, Xianxiang Chang","doi":"10.1016/j.patrec.2024.10.017","DOIUrl":"10.1016/j.patrec.2024.10.017","url":null,"abstract":"<div><div>Point cloud completion network often encodes points into a global feature vector, then predicts the complete point cloud through the vector generation process. However, this method may not accurately capture complex shapes, as global feature vectors struggle to recover their detailed structure. In this paper, we present a novel shape completion network, namely RD-Net, that innovatively focuses on the interaction of information between points to provide both local and global information for generating fine-grained complete shape. Specifically, we propose a stored iteration-based method for point cloud sampling that quickly captures representative points within the point cloud. Subsequently, in order to better predict the shape and structure of the missing part, we design an iterative edge-convolution module. It uses a CNN-like hierarchy for feature extraction and learning context information. Moreover, we design a two-stage reconstruction process for latent vector decoding. We first employ a feature-points-based multi-scale generating decoder to estimate the missing point cloud hierarchically. This is followed by a self-attention mechanism that refines the generated shape and effectively generates structural details. By combining these innovations, RD-Net achieves a 2% reduction in CD error compared to the state-of-the-art method on the ShapeNet-part dataset.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 322-329"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Text-free diffusion inpainting using reference images for enhanced visual fidelity","authors":"Beomjo Kim, Kyung-Ah Sohn","doi":"10.1016/j.patrec.2024.10.009","DOIUrl":"10.1016/j.patrec.2024.10.009","url":null,"abstract":"<div><div>This paper presents a novel approach to subject-driven image generation that addresses the limitations of traditional text-to-image diffusion models. Our method generates images using reference images without relying on language-based prompts. We introduce a visual detail preserving module that captures intricate details and textures, addressing overfitting issues associated with limited training samples. The model's performance is further enhanced through a modified classifier-free guidance technique and feature concatenation, enabling the natural positioning and harmonization of subjects within diverse scenes. Quantitative assessments using CLIP, DINO and Quality scores (QS), along with a user study, demonstrate the superior quality of our generated images. Our work highlights the potential of pre-trained models and visual patch embeddings in subject-driven editing, balancing diversity and fidelity in image generation tasks. Our implementation is available at <span><span>https://github.com/8eomio/Subject-Inpainting</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 221-228"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142535003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingxuan Chen , Shiqi Li , Xujun Wei , Jiacheng Song
{"title":"MMIFR: Multi-modal industry focused data repository","authors":"Mingxuan Chen , Shiqi Li , Xujun Wei , Jiacheng Song","doi":"10.1016/j.patrec.2024.11.001","DOIUrl":"10.1016/j.patrec.2024.11.001","url":null,"abstract":"<div><div>In the rapidly advancing field of industrial automation, the availability of robust and diverse datasets is crucial for the development and evaluation of machine learning models. The data repository consists of four distinct versions of datasets: MMIFR-D, MMIFR-FS, MMIFR-OD and MMIFR-P. The MMIFR-D dataset comprises a comprehensive assemblage of 5907 images accompanied by corresponding textual descriptions, notably facilitating the application of industrial equipment classification. In contrast, the MMIFR-FS dataset serves as an alternative variant characterized by the inclusion of 129 distinct classes and 5907 images, specifically catering to the task of few-shot learning within the industrial domain. MMIFR-OD is another alternative variant, comprising 8,839 annotation instances across 128 distinct categories, is predominantly utilized for object detection tasks. Additionally, the MMIFR-P dataset consists of 142 textual–visual information pairs, making it suitable for detecting pairs of industrial equipment. Furthermore, we conduct a comprehensive comparative analysis of our dataset in relation to other datasets used in industrial settings. Benchmark performances for different industrial tasks on our data repository are provided. The proposed multimodal dataset, MMIFR, can be utilized for research in industrial automation, quality control, safety monitoring, and other relevant domains.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 306-313"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142657599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yibo Lou , Wenjie Zhang , Xiaoning Song , Yang Hua , Xiao-Jun Wu
{"title":"EDS: Exploring deeper into semantics for video captioning","authors":"Yibo Lou , Wenjie Zhang , Xiaoning Song , Yang Hua , Xiao-Jun Wu","doi":"10.1016/j.patrec.2024.09.017","DOIUrl":"10.1016/j.patrec.2024.09.017","url":null,"abstract":"<div><div>Efficiently leveraging semantic information is crucial for advancing video captioning in recent years. But, prevailing approaches that involve designing various Part-of-Speech (POS) tags as prior information lack essential linguistic knowledge guidance throughout the training procedure, particularly in the context of POS and initial description generation. Furthermore, the restriction to a single source of semantic information ignores the potential for varied interpretations inherent in each video. To solve these problems, we propose the Exploring Deeper into Semantics (EDS) method for video captioning. EDS comprises three feasible modules that focus on semantic information. Specifically, we propose the Semantic Supervised Generation (SSG) module. It integrates semantic information as a prior, and facilitates enriched interrelations among words for POS supervision. A novel Similarity Semantic Extension (SSE) module is proposed to employ a query-based semantic expansion for collaboratively generating fine-grained content. Additionally, the proposed Input Semantic Enhancement (ISE) module provides a strategy for mitigating the information constraints faced during the initial phase of word generation. The experiments conducted show that, by exploiting semantic information through supervision, extension, and enhancement, EDS not only yields promising results but also underlines the effectiveness. Code will be available at <span><span>https://github.com/BradenJoson/EDS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 133-140"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FAM: Adaptive federated meta-learning for MRI data","authors":"Indrajeet Kumar Sinha, Shekhar Verma, Krishna Pratap Singh","doi":"10.1016/j.patrec.2024.09.018","DOIUrl":"10.1016/j.patrec.2024.09.018","url":null,"abstract":"<div><div>Federated learning enables multiple clients to collaborate to train a model without sharing data. Clients with insufficient data or data diversity participate in federated learning to learn a model with superior performance. MRI data suffers from inadequate data and different data distribution due to differences in MRI scanners and client characteristics. Also, privacy concerns preclude data sharing. In this work, we propose a novel adaptive federated meta-learning (FAM) mechanism for collaboratively learning a single global model, which is personalized locally on individual clients. The learnt sparse global model captures the common features in the MRI data across clients. This model is grown on each client to learn a personalized model by capturing additional client-specific parameters from local data. Experimental results on multiple data sets show that the personalization process at each client quickly converges using a limited number of epochs. The personalized client models outperformed the locally trained models, demonstrating the efficacy of the FAM mechanism. Additionally, the FAM-based sparse global model has fewer parameters that require less communication overhead during federated learning. This makes the model viable for networks with limited resources.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"186 ","pages":"Pages 205-212"},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}