Pengfei Hu , Jiefeng Ma , Zhenrong Zhang , Jun Du , Jianshu Zhang
{"title":"Count, decompose and correct: A new approach to handwritten Chinese character error correction","authors":"Pengfei Hu , Jiefeng Ma , Zhenrong Zhang , Jun Du , Jianshu Zhang","doi":"10.1016/j.patcog.2024.111110","DOIUrl":"10.1016/j.patcog.2024.111110","url":null,"abstract":"<div><div>Recently, handwritten Chinese character error correction has been greatly improved by employing encoder–decoder methods to decompose a Chinese character into an ideographic description sequence (IDS). However, existing methods implicitly capture and encode linguistic information inherent in IDS sequences, leading to a tendency to generate IDS sequences that match seen characters. This poses a challenge when dealing with an unseen misspelled character, as the decoder may generate an IDS sequence that matches a seen character instead. Therefore, we introduce Count, Decompose and Correct (CDC), a novel approach that exhibits better generalization towards unseen misspelled characters. CDC is mainly composed of three parts: the Counter, the Decomposer, and the Corrector. In the first stage, the Counter predicts the number of each radical class without the symbol-level position annotations. In the second stage, the Decomposer employs the counting information and generates the IDS sequence step by step. Moreover, by updating the counting information at each time step, the Decomposer becomes aware of the existence of each radical. With the decomposed IDS sequence, we can determine whether the given character is misspelled. If it is misspelled, the Corrector under the transductive transfer learning strategy predicts the ideal character that the user originally intended to write. We integrate our method into existing encoder–decoder models and significantly enhance their performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111110"},"PeriodicalIF":7.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tensorized latent representation with automatic dimensionality selection for multi-view clustering","authors":"Bing Cai , Gui-Fu Lu , Xiaoxing Guo , Tong Wu","doi":"10.1016/j.patcog.2024.111192","DOIUrl":"10.1016/j.patcog.2024.111192","url":null,"abstract":"<div><div>Latent representation has garnered significant attention in the field of multi-view learning due to its ability to capture the underlying structures of raw data and achieve promising results. However, latent representation-based methods often encounter challenges in selecting the dimensionality of the latent view, which limits their applicability. To address this problem, we propose a novel method called Tensorized Latent Representation with Automatic Dimensionality Selection (TLRADS), which can automatically determine the optimal dimensions. In TLRADS, we leverage the cumulative contribution rate of singular values to determine the number of dimensions for each view-specific latent representation. This approach ensures that the chosen dimensions capture a significant portion of the data’s variability while discarding less relevant information. After obtaining the latent representation views, we incorporate the tensor subspace learning technique to capture the underlying structural information more comprehensively. Finally, an efficient iterative algorithm is designed to solve the TLRADS model. Through experimental validation, we demonstrate the effectiveness of the automatic dimensionality selection strategy in TLRADS. Meanwhile, the experimental results on real-life datasets indicate that TLRADS outperforms state-of-the-art multi-view clustering methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111192"},"PeriodicalIF":7.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xudong Ling , Chaorong Li , Fengqing Qin , Peng Yang , Yuanyuan Huang
{"title":"RNDiff: Rainfall nowcasting with Condition Diffusion Model","authors":"Xudong Ling , Chaorong Li , Fengqing Qin , Peng Yang , Yuanyuan Huang","doi":"10.1016/j.patcog.2024.111193","DOIUrl":"10.1016/j.patcog.2024.111193","url":null,"abstract":"<div><div>The Diffusion Models are widely used in image generation because they can generate high-quality and realistic samples. In contrast, generative adversarial networks (GANs) and variational autoencoders (VAEs) have some limitations in terms of image quality. We introduce a diffusion model to the precipitation forecasting task and propose a short-term precipitation nowcasting with condition diffusion model based on historical observational data, which is referred to as Rainfall nowcasting with Condition Diffusion Model(RNDiff). By incorporating an additional conditional decoder module in the denoising process, RNDiff achieves end-to-end conditional rainfall prediction. RNDiff is composed of two networks: a denoising network and a conditional encoder network. The conditional network is composed of multiple independent UNet networks. These networks extract conditional feature maps at different resolutions, providing accurate conditional information that guides the diffusion model for conditional generation. RNDiff surpasses GANs in terms of prediction accuracy, although it requires more computational resources. The RNDiff model exhibits higher stability and efficiency during training than GANs-based approaches, and generates high-quality precipitation distribution samples that better reflect future actual precipitation conditions. Compared to the current state-of-the-art GAN-based methods, our proposed approach achieves significant improvements on key evaluation metrics. Specifically, our method leads to improvements in the CSI, HSS, and FSS, which are increased by around 8%, 5%, and 6%, respectively. The experiment fully verified the advantages and potential of RNdiff in precipitation forecasting and provided new insights for improving rainfall forecasting. Our project is open source and available on GitHub at: <span><span>https://github.com/ybu-lxd/RNDiff</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111193"},"PeriodicalIF":7.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Nirmala , P. Prathap Nayudu , A. Ranjith Kumar , Renuka Sagar
{"title":"Automatic cervical cancer classification using adaptive vision transformer encoder with CNN for medical application","authors":"G. Nirmala , P. Prathap Nayudu , A. Ranjith Kumar , Renuka Sagar","doi":"10.1016/j.patcog.2024.111201","DOIUrl":"10.1016/j.patcog.2024.111201","url":null,"abstract":"<div><div>Accurate and early cervical cancer screening can reduce the mortality rate of cervical cancer patients. The Pap test, often known as a Pap smear, is one of the frequently used methods for the early diagnosis of cervical cancer. However, manual analysis can be time-consuming. Previous approaches have faced challenges such as low accuracy, increased computing complexity, larger feature dimensionality, poor reliability, and increased time consumption due to subpar hyper-parameter optimization. This paper proposes an automatic cervical cancer classification system using a deep learning algorithm to address these issues. The proposed system consists of three stages: pre-processing, segmentation, and classification. Initially, images are collected and pre-processed through normalization, smoothing, and resizing. The pre-processed images are then passed to the segmentation stage, where an Adaptive Deep Residual Aggregation Network is utilized (ADRAN). After segmentation, the images are classified into seven categories: Carcinoma_in_situ, Light_dysplastic, Moderate_dysplastic, Normal_columnar, Normal_Intermediate, Normal_superficial, and Severe_dysplastic using an Adaptive Vision Transformer Encoder (AVTE) with CNN. To improve the efficiency of the transformer learning network, the hyperparameters of AVTE with CNN are optimized using an Adaptive Cat Swarm Optimization algorithm (ACSO). The efficiency of the presented technique is evaluated based on various metrics, and experimentation is conducted using the Herlev dataset.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111201"},"PeriodicalIF":7.5,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BiFPN-YOLO: One-stage object detection integrating Bi-Directional Feature Pyramid Networks","authors":"John Doherty , Bryan Gardiner , Emmett Kerr , Nazmul Siddique","doi":"10.1016/j.patcog.2024.111209","DOIUrl":"10.1016/j.patcog.2024.111209","url":null,"abstract":"<div><div>Object detection is a key component in computer vision research, allowing a system to determine the location and type of object within any given scene. YOLOv5 is a modern object detection model, which utilises the advantages of the original YOLO implementation while being built from scratch in Python. In this paper, BiFPN-YOLO is proposed, featuring clear improvements over the existing range of YOLOv5 object detection models; these include replacing the traditional Path-Aggregation Network (PANet) with a higher performing Bi-Directional Feature Pyramid Network (BiFPN), requiring complex adaptation from its original implementation to function with YOLOv5, as well as exploring a replacement to the standard Swish activation function by evaluating the performance against a number of other activation functions. The proposed model showcases state-of-the-art performance, benchmarking against well-known datasets such as the German Traffic Sign Detection Benchmark (GTSDB), improving mAP by 3.1 %, and the RoboFEI@Home dataset, where Mean Average Precision (mAP) is improved by 2 % compared to the base YOLOv5 model. Performance was also improved on MSCOCO by 1.1 % and a custom subset of the OpenImagesV6 dataset by 2.4 %.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111209"},"PeriodicalIF":7.5,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingwei Hu , Xinjie Li , Xiaodong Li , Zhensong Hou , Zhihong Zhang
{"title":"Optimizing reinforcement learning for large action spaces via generative models: Battery pattern selection","authors":"Jingwei Hu , Xinjie Li , Xiaodong Li , Zhensong Hou , Zhihong Zhang","doi":"10.1016/j.patcog.2024.111194","DOIUrl":"10.1016/j.patcog.2024.111194","url":null,"abstract":"<div><div>Intrinsic and environmental factors contribute to variability in the performance of cells within a battery pack, affecting the lifespan and safety of battery systems. To solve this problem, active and passive equalization methods are proposed. However, existing passive equalization methods suffer from energy loss and low efficiency among batteries, while existing active equalization methods necessitate complex expert knowledge and control algorithms. We propose an active equalization model that leverages a generative model (GM) to assist in pattern selection for a reinforcement learning (RL) scheme, tailored for Dynamic Reconfigurable Battery (DRB) systems. The proposed model overcomes the pattern selection challenge in large-scale discrete action spaces by employing a Variational Autoencoder (VAE) for dimensionality reduction and latent space mapping, actively balancing DRB systems. Moreover, the use of pattern subgraphs diminishes dependence on expert knowledge, enabling the model to recognize structural information and adjust the system’s stability. The experimental setup adheres to the laws of physics and tests the model’s functionality on a simulation system. Results show that the proposed Generative Model-based Reinforcement Learning (GMRL) approach effectively addresses decision-making challenges in large-scale spaces. It can learn the structured features of the battery network, thus balancing the energy storage system and maximizing discharge efficiency gains.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111194"},"PeriodicalIF":7.5,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Valentine Wargnier-Dauchelle , Thomas Grenier , Françoise Durand-Dubief , François Cotton , Michaël Sdika
{"title":"Explainable monotonic networks and constrained learning for interpretable classification and weakly supervised anomaly detection","authors":"Valentine Wargnier-Dauchelle , Thomas Grenier , Françoise Durand-Dubief , François Cotton , Michaël Sdika","doi":"10.1016/j.patcog.2024.111186","DOIUrl":"10.1016/j.patcog.2024.111186","url":null,"abstract":"<div><div>Deep networks interpretability is fundamental in critical domains like medicine: using easily explainable networks with decisions based on radiological signs and not on spurious confounders would reassure the clinicians. Confidence is reinforced by the integration of intrinsic properties and characteristics of monotonic networks could be used to design such intrinsically explainable networks. As they are considered as too constrained and difficult to train, they are often very shallow and rarely used for image applications. In this work, we propose a procedure to transform any architecture into a trainable monotonic network, identifying the critical importance of weights initialization, and highlight the interest of such networks for explicability and interpretability. By constraining the features and gradients of a healthy vs pathological images classifier, we show, using counterfactual examples, that the network decision is more based on radiological signs of the pathology and outperform state-of-the-art weakly supervised anomaly detection methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111186"},"PeriodicalIF":7.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Boyu Chen , Siran Chen , Kunchang Li , Qinglin Xu , Yu Qiao , Yali Wang
{"title":"Percept, Chat, Adapt: Knowledge transfer of foundation models for open-world video recognition","authors":"Boyu Chen , Siran Chen , Kunchang Li , Qinglin Xu , Yu Qiao , Yali Wang","doi":"10.1016/j.patcog.2024.111189","DOIUrl":"10.1016/j.patcog.2024.111189","url":null,"abstract":"<div><div>Open-world video recognition is challenging since traditional networks are not generalized well on complex environment variations. Alternatively, foundation models with rich knowledge have recently shown their generalization power. However, how to apply such knowledge has not been fully explored for open-world video recognition. To this end, we propose a generic knowledge transfer pipeline, which progressively exploits and integrates external multimodal knowledge from foundation models to boost open-world video recognition. We name it <strong>PCA</strong>, based on three stages of <strong>P</strong>ercept, <strong>C</strong>hat, and <strong>A</strong>dapt. First, we perform Percept process to reduce the video domain gap and obtain external visual knowledge. Second, we generate rich linguistic semantics as external textual knowledge in Chat stage. Finally, we blend external multimodal knowledge in Adapt stage, by inserting multimodal knowledge adaptation modules into networks. We conduct extensive experiments on three challenging open-world video benchmarks, i.e., TinyVIRAT, ARID, and QV-Pipe. Our approach achieves state-of-the-art performance on all three datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111189"},"PeriodicalIF":7.5,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Debiasing weighted multi-view k-means clustering based on causal regularization","authors":"Xiuqi Huang, Hong Tao, Haotian Ni, Chenping Hou","doi":"10.1016/j.patcog.2024.111195","DOIUrl":"10.1016/j.patcog.2024.111195","url":null,"abstract":"<div><div>In the field of unsupervised learning, many methods such as clustering rely on exploring the correlations among features. However, considering these correlations is not always advantageous for learning models. The biased selection of data may lead to redundant and unstable correlations among features, adversely affecting the performance of learning models. Multi-view data presents more complex feature correlations with potential redundancy and varying distributions across views, necessitating detailed analysis. This paper proposes a causal regularized debiased multi-view k-means clustering (DMKC) method to counteract redundant feature correlations stemming from sample selection bias. This method introduces a covariate weighted balance method from causal inference to mitigate redundant bias in multi-view clustering by adjusting sample weights. The approach combines sample and view weights within a k-means loss framework, effectively eliminating feature redundancy and enhancing clustering performance amidst sample selection bias. The optimization process of the relevant parameters is detailed in this paper, and comprehensive experiments demonstrate the effectiveness of the method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111195"},"PeriodicalIF":7.5,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"STARNet: Low-light video enhancement using spatio-temporal consistency aggregation","authors":"Zhe Wu , Zehua Sheng , Xue Zhang , Si-Yuan Cao , Runmin Zhang , Beinan Yu , Chenghao Zhang , Bailin Yang , Hui-Liang Shen","doi":"10.1016/j.patcog.2024.111180","DOIUrl":"10.1016/j.patcog.2024.111180","url":null,"abstract":"<div><div>In low-light environments, capturing high-quality videos is an imaging challenge due to the limited number of photons. Previous low-light enhancement approaches usually result in over-smoothed details, temporal flickers, and color deviation. We propose STARNet, an end-to-end video enhancement network that leverages temporal consistency aggregation to address these issues. We introduce a spatio-temporal consistency aggregator, which extracts structures from multiple frames in hidden space to overcome detail corruption and temporal flickers. It parameterizes neighboring frames to extract and align consistent features, and then selectively fuses consistent features to restore clear structures. To further enhance temporal consistency, we develop a local temporal consistency constraint with robustness against the warping error from motion estimation. Furthermore, we employ a normalized low-frequency color constraint to regularize the color as the normal-light condition. Extensive experimental results on real datasets show that the proposed method achieves better detail fidelity, color accuracy, and temporal consistency, outperforming state-of-the-art approaches.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"160 ","pages":"Article 111180"},"PeriodicalIF":7.5,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142722946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}