{"title":"Network Fission Ensembles for low-cost self-ensembles","authors":"Hojung Lee, Jong-Seok Lee","doi":"10.1016/j.patrec.2025.01.032","DOIUrl":"10.1016/j.patrec.2025.01.032","url":null,"abstract":"<div><div>Recent ensemble learning methods for image classification have demonstrated improved accuracy with low extra cost. However, they still rely on multiple trained models for ensemble inference, which can become a significant burden as the model size grows. Moreover, their performance has been somewhat limited compared to Deep Ensembles, primarily due to the lower performance of individual ensemble members. In this paper, we propose a low-cost ensemble learning and inference method called Network Fission Ensembles (NFE), which transforms a conventional network into a multi-exit structure allowing predictions to be made at different stages and enabling ensemble learning. To achieve this, we group the weight parameters in the layers into several sets and create multiple auxiliary paths by combining each set to construct multi-exits. We call this process Network Fission. Since this process simply changes the existing network structure to have multiple exits (i.e., classification outputs) without using additional networks, there is no extra computational burden. Furthermore, we employ an ensemble knowledge distillation technique exploiting the losses of all exits to train the network, so that we can achieve high generalization performance despite the reduced network size of each path composed of pruned weights. With our simple yet effective method, we achieve an accuracy of 83.5% on CIFAR100 with Wide-ResNet28-10, surpassing the best existing ensemble method, Deep Ensembles, which achieves 83.0%, while only one-third of the computational complexity is required in our method. The code is available at <span><span>https://github.com/hjdw2/NFE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 22-28"},"PeriodicalIF":3.9,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143378208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Geometric insights into focal loss: Reducing curvature for enhanced model calibration","authors":"Masanari Kimura , Hiroki Naganuma","doi":"10.1016/j.patrec.2025.01.031","DOIUrl":"10.1016/j.patrec.2025.01.031","url":null,"abstract":"<div><div>The key factor in implementing machine learning algorithms in decision-making situations is not only the accuracy of the model but also its confidence level. The confidence level of a model in a classification problem is often given by the output vector of a softmax function for convenience. However, these values are known to deviate significantly from the actual expected model confidence. This problem is called model calibration and has been studied extensively. One of the simplest techniques to tackle this task is focal loss, a generalization of cross-entropy by introducing one positive parameter. Although many related studies exist because of the simplicity of the idea and its formalization, the theoretical analysis of its behavior is still insufficient. In this study, our objective is to understand the behavior of focal loss by reinterpreting this function geometrically. Our analysis suggests that focal loss reduces the curvature of the loss surface in training the model. This indicates that curvature may be one of the essential factors in achieving model calibration. We design numerical experiments to support this conjecture to reveal the behavior of focal loss and the relationship between calibration performance and curvature.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"189 ","pages":"Pages 195-200"},"PeriodicalIF":3.9,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143335460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TS-DETR: Multi-scale DETR for traffic sign detection and recognition","authors":"Yang Cui, Yi Han, Dong Guo","doi":"10.1016/j.patrec.2025.01.027","DOIUrl":"10.1016/j.patrec.2025.01.027","url":null,"abstract":"<div><div>Traffic Sign Detection and Recognition (TSDR) is essential for driverless cars and driver assistance systems. Machine vision tasks, such as TSDR, have gained significant attention. Convolutional neural networks (CNNs) are often employed for these tasks, but the introduction of visual transformers offers an alternative approach to global feature learning. As a novel object detection paradigm, DETR (Detection Transformer) can correlate contextual information, making it suitable for TSDR tasks. In this paper, we propose a novel network, TS-DETR, designed for TSDR. This network builds upon the RT-DETR (Real-Time Detection Transformer) and incorporates a multi-sequence feature fusion strategy along with an enhanced attention mechanism. First, we design a feature connection module alongside a multi-sequence feature fusion module, allowing the TS-DETR network to acquire more comprehensive multi-scale information. Then, considering the positional characteristics of traffic signs, we introduce channel and spatial attention modules, which enhance the network’s ability to utilize this information. Finally, we incorporate an inverted residual moving block to balance network performance. The experimental outcomes on the publicly available CCTSDB dataset illustrate that TS-DETR improves detection accuracy for all three traffic sign categories, outperforms YOLOv5s and YOLOv8s, and improves the recognition rate by 3.5% compared to the original RT-DETR algorithm. These results underscore the effectiveness of TS-DETR for TSDR tasks.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 147-152"},"PeriodicalIF":3.9,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143509119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingbao Zhang , Rui Song , Xiang Li , Adriano Tavares , Hao Xu
{"title":"Few-shot Hierarchical Text Classification with Bidirectional Path Constraint by label weighting","authors":"Mingbao Zhang , Rui Song , Xiang Li , Adriano Tavares , Hao Xu","doi":"10.1016/j.patrec.2025.01.025","DOIUrl":"10.1016/j.patrec.2025.01.025","url":null,"abstract":"<div><div>Hierarchical Text Classification (HTC) organizes candidate labels into a hierarchical structure and uses one or more paths within the hierarchy as the ground-truth labels, which has been applied to various downstream tasks, <em>e.g.,</em> sentiment analysis and harmful text detection. Existing works often involve data-driven models that are trained on large-scale datasets. However, creating annotated datasets is labor-intensive and time-consuming. To address this issue, recent work has focused on the few-shot HTC task, where each class has only a few samples, <em>e.g.,</em> 5. These approaches perform classification at each layer separately and leverage the prompt learning capability of pre-trained models like BERT. However, we find that these methods always neglect the inter-layer relationships. To solve this problem, we propose a new model called Bidirectional Path Constraint by Label Weighting (<span>Bpc-lw</span>). Its basic idea is to use a pre-defined label embedding matrix and a feed-forward neural network for information propagation between layers, while also designing a bidirectional label weighting method to constrain the predictions of each layer to be along the same path in the label hierarchy. In addition, we employ a contrastive learning-based method to enhance the discriminative capacity of the hierarchical embeddings. We compare our proposed method with recent few-shot HTC baseline models across 3 benchmark datasets, and the experimental results demonstrate the effectiveness of <span>Bpc-lw</span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 81-88"},"PeriodicalIF":3.9,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143437600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhifeng Liu , Kaien Wei , Chuanhai Chen , Baobao Qi , Jinyan Guo , Zhiwen Lin
{"title":"Thermal error recognition and prediction method for economical motorized spindles based on thermal characteristics knowledge graph","authors":"Zhifeng Liu , Kaien Wei , Chuanhai Chen , Baobao Qi , Jinyan Guo , Zhiwen Lin","doi":"10.1016/j.patrec.2025.02.004","DOIUrl":"10.1016/j.patrec.2025.02.004","url":null,"abstract":"<div><div>During CNC machining processes, economical motorized spindles exhibit nonlinear temperature increases and progressive thermal errors. Accurate recognition and prediction of these thermal errors are essential prerequisites for achieving stable and controllable machining processes. Existing thermal error models primarily rely on temperature signals while neglecting the coupling effects of other non-stationary, multi-source heterogeneous signals on thermal characteristics. This oversight results in limited theoretical understanding of the relationship between heat generation and multiple signal sources, leading to significant inaccuracies in thermal error recognition and prediction. To address these limitations, this paper elucidates the mechanism of motorized spindle thermal characteristics and their mapping relationships with multi-source signals, constructing them into interpretable thermal characteristic knowledge. Based on this foundation, a novel knowledge graph-based method for thermal error recognition and prediction is proposed, which effectively mitigates machining errors caused by thermal deformation. First, a knowledge graph is employed to structurally characterize the multi-source heterogeneous information of motorized spindles during the machining process, revealing the mapping mechanism of “machining process-multi-source signals-thermal characteristics-thermal errors”. The thermal characteristic knowledge is then systematically extracted through graph embedding. Second, a Long Short-Term Memory (LSTM) network with a flexible multi-head attention mechanism is developed, integrating graph embedding features and sensor data characteristics for thermal error recognition and prediction. Finally, the effectiveness and superiority of this method under different working conditions are verified on a high-speed CNC machine tool. The results demonstrate up to 36 % improvement in prediction accuracy compared to conventional machine learning methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"189 ","pages":"Pages 229-238"},"PeriodicalIF":3.9,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143386987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DSC-GRUNet: A lightweight neural network model for multimodal gesture recognition based on depthwise separable convolutions and GRU","authors":"Huaigang Yang , Dong Zhang , Ping Xie , Xiaoling Chen","doi":"10.1016/j.patrec.2025.02.008","DOIUrl":"10.1016/j.patrec.2025.02.008","url":null,"abstract":"<div><div>With the advancement of human-computer interaction (HCI) technology, gesture recognition methods based on electromyography (EMG) signals have garnered widespread attention, particularly in fields such as rehabilitation medicine and smart prosthetics. However, traditional EMG-based gesture recognition methods face challenges, including insufficient accuracy and poor noise resistance when handling complex gestures and diverse scenarios. To address these challenges, this study proposes a lightweight gesture recognition network based on multimodal signal fusion, combining surface EMG and Acceleration (ACC) signals. The proposed model integrates Depthwise Separable Convolutions (DSC) and Gated Recursive Units (GRU) to achieve a lightweight design while maintaining recognition performance. Experimental results demonstrate that the proposed method achieves recognition accuracies of 92.03±3.28 % and 77.48±4.38 % on the NinaPro DB2 and DB5 datasets, respectively, outperforming other state-of-the-art methods in terms of efficiency and computational cost. Additionally, the fusion of multimodal data significantly enhances the recognition performance of dynamic gestures. This study provides new insights into the design of embedded, real-time gesture recognition systems and holds important practical implications.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 35-44"},"PeriodicalIF":3.9,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143386255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Yin , Mingyang Cheng , Shuaiming Su , Ray Y. Zhong , Shuxuan Zhao
{"title":"An explainable super-resolution visual method for micro-crack image detection","authors":"Li Yin , Mingyang Cheng , Shuaiming Su , Ray Y. Zhong , Shuxuan Zhao","doi":"10.1016/j.patrec.2025.02.007","DOIUrl":"10.1016/j.patrec.2025.02.007","url":null,"abstract":"<div><div>To improve the performance of crack image detection in the construction industry, this paper designs a gradient-guided micro-crack image super-resolution (SR) visual method. Firstly, to solve the problem of low resolution (LR) and smooth grayscale differences in the images, an interpretable gradient-guided image SR model is developed to achieve high-fidelity SR reconstruction of LR images. Then, to address the large amount of interference noise in the background, a micro-crack pixel-level selection module is proposed based on the SR model, which achieved high-fidelity reconstruction of the micro-crack region while reducing the impact of interference noise to a certain extent. Finally, this paper validates and analyzes the performance of the proposed methods through a real crack image dataset, showing the effectiveness of the proposed methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"189 ","pages":"Pages 157-165"},"PeriodicalIF":3.9,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143178195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized relative neighborhood graph (GRNG) for similarity search","authors":"Cole Foster, Berk Sevilmis, Benjamin Kimia","doi":"10.1016/j.patrec.2024.12.002","DOIUrl":"10.1016/j.patrec.2024.12.002","url":null,"abstract":"<div><div>A critical factor in graph-based similarity search is the choice of graph that represents the underlying space’s structure. Proximity graphs, such as Relative Neighbor Graphs (RNG), are defined by the neighborhood relations between pairs of points. As a result, computing these graphs typically involves a more computationally intensive trinary operation, making them an order of magnitude more expensive than the widely used <span><math><mi>k</mi></math></span>-Nearest Neighbors (<span><math><mi>k</mi></math></span>NN) graph, which relies only on pairwise distances. However, the <span><math><mi>k</mi></math></span>NN graph often suffers from disconnections and requires manual parameter selection, whereas the RNG better captures the geometry of the space. While several methods have attempted to reduce the computational cost of constructing an RNG, these are usually approximate and lack scalability. This paper introduces an incremental, hierarchical method that employs a novel proximity graph called the Generalized Relative Neighborhood Graph (GRNG). The GRNG organizes a pivot layer that efficiently guides the exact construction of the graph for the subsequent layer. This multi-layer, exact approach to RNG construction represents a significant improvement over existing methods that only produce approximate RNGs.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 103-110"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gaowei Zhang , Yang Lu , Xiaoheng Jiang , Feng Yan , Mingliang Xu
{"title":"Industrial product quality assessment using deep learning with defect attributes","authors":"Gaowei Zhang , Yang Lu , Xiaoheng Jiang , Feng Yan , Mingliang Xu","doi":"10.1016/j.patrec.2024.11.032","DOIUrl":"10.1016/j.patrec.2024.11.032","url":null,"abstract":"<div><div>Industrial defect quantification refers to the process of quantifying various defects that occur during the production of industrial products, with the aim of achieving precise grading of product quality. This process is a key component in the quality monitoring phase of production. The quantification of defects faces challenges such as accurately describing the appearance of defects and difficulty in dividing product grades. In response to these issues, we propose a Defect Attribute-based Quantitative Assessment Network (DQANet). It employs a Refined Feature Attention Module (RFAM) to guide the flow of information and fusion of features during the encoding process and uses deformable convolution at the end of the encoder to extract richer contextual information while effectively suppressing noise. We conducted a quantitative evaluation of defects and annotated quality grades on the Mobile Screen Defect (MSD) dataset to train and validate the performance of our model. The experimental results indicate that our method has achieved promising performance. In addition, this work applies the defect attribute-based quantification assessment network to the tasks of defect classification and segmentation. Experimental results show that defect quantification assessment can provide effective auxiliary information, helping the network understand defect features more accurately and improve the performance of related visual tasks.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 67-73"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhengwei Wang , Xukun Zhang , Xiaoying Wang , Lihua Zhang
{"title":"Enhanced multi-modal abdominal image registration via structural awareness and region-specific optimization","authors":"Zhengwei Wang , Xukun Zhang , Xiaoying Wang , Lihua Zhang","doi":"10.1016/j.patrec.2024.11.026","DOIUrl":"10.1016/j.patrec.2024.11.026","url":null,"abstract":"<div><div>Multi-modal abdominal image registration is a critical step in achieving accurate diagnosis and treatment. However, the large field of view, complex organ structures, and significant anatomical variability in the abdominal region pose substantial challenges to precise multi-modal registration. To address these challenges, we propose a novel framework based on structural awareness and region-specific optimization, which progressively enhances registration accuracy through a hierarchical strategy. The core innovation lies in introducing the Modality Independent Structural Awareness (MISA) loss, which comprises two key components: first, it ensures structural consistency across modalities by deeply understanding the critical anatomical structures; second, it employs a region-specific weighting strategy to prioritize key anatomical regions, such as organ boundaries, thereby achieving high-precision, region-specific constraints in multi-modal registration. Experimental results demonstrate that our method outperforms state-of-the-art approaches on two public multi-modal abdominal datasets, particularly in accurately capturing complex anatomical structures and maintaining tissue consistency. Further analysis confirms that the MISA loss and its two innovative components are pivotal in enhancing multi-modal abdominal image registration accuracy.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 29-36"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}