{"title":"A small object detection method with context information for high altitude images","authors":"Zhengkai Ma , Linli Zhou , Di Wu , Xianliu Zhang","doi":"10.1016/j.patrec.2024.11.027","DOIUrl":"10.1016/j.patrec.2024.11.027","url":null,"abstract":"<div><div>Detection of small objects stands as a pivotal and difficult task because of their low resolution and lack of visualization features. Though achieving some promising results, recent detection methods utilize the context information insufficiently, leading to inadequate small object feature representation and increasing the misdetection and omission rates. We propose a method named Context Information Enhancement YOLO(CIE-YOLO) for small object detection. CIE-YOLO mainly includes a Context Reinforcement Module(CRM), a Channel Spatial Joint Attention(CSJA) module, and a Pixel Feature Enhancement Module(PFEM). The CRM module extracts and enhances the context information to mitigate the confusion between small objects and the background in the network. Then CSJA suppresses the background noise to highlight important small object features. Finally, PFEM reduces the small object feature losses in up-sampling via feature enhancement and pixel resolution enhancement. The effectiveness of the proposed CIE-YOLO in small object detection is demonstrated by extensive experiments.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 22-28"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clinical knowledge aware synthesized CT image-based framework for improved detection and segmentation of hemorrhages","authors":"Chitimireddy Sindhura, Subrahmanyam Gorthi","doi":"10.1016/j.patrec.2024.11.028","DOIUrl":"10.1016/j.patrec.2024.11.028","url":null,"abstract":"<div><div>Intracranial hemorrhage (ICH) is a life-threatening condition characterized by bleeding within the brain tissue, necessitating immediate diagnosis and treatment to improve survival rates. CT imaging is the most commonly used modality for ICH diagnosis. Current methods typically depend on extensive annotated datasets and complex networks, and do not explicitly utilize the patient-specific clinical insights, which are crucial for precise diagnoses. In this paper, we introduce a novel deep-learning framework that utilizes synthesized CT images infused with clinical brain information to enhance the detection and segmentation of hemorrhages. This approach enhances data by synthesizing CT images based on the midsagittal plane and creates an asymmetry map that highlights the differences between the left and right halves of the CT image. We evaluated the performance of this approach using state-of-the-art deep learning architectures on two public datasets, INSTANCE and BHSD data sets, comprising around 300 CT scans with various types of haemorrhages. Results show that incorporating anatomical information improves the Dice Similarity Coefficient (DSC) for ICH segmentation by 7%–12% and increases detection accuracy by 4%–8%. Our findings suggest that incorporating prior anatomical knowledge can significantly enhance automated ICH diagnosis systems, paving the way for more reliable diagnostic solutions, even with limited data availability.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 46-52"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A cross-feature interaction network for 3D human pose estimation","authors":"Jihua Peng , Yanghong Zhou , P.Y. Mok","doi":"10.1016/j.patrec.2025.01.016","DOIUrl":"10.1016/j.patrec.2025.01.016","url":null,"abstract":"<div><div>The task of estimating 3D human poses from single monocular images is challenging because, unlike video sequences, single images can hardly provide any temporal information for the prediction. Most existing methods attempt to predict 3D poses by modeling the spatial dependencies inherent in the anatomical structure of the human skeleton, yet these methods fail to capture the complex local and global relationships that exist among various joints. To solve this problem, we propose a novel Cross-Feature Interaction Network to effectively model spatial correlations between body joints. Specifically, we exploit graph convolutional networks (GCNs) to learn the local features between neighboring joints and the self-attention structure to learn the global features among all joints. We then design a cross-feature interaction (CFI) module to facilitate cross-feature communications among the three different features, namely the local features, global features, and initial 2D pose features, aggregating them to form enhanced spatial representations of human pose. Furthermore, a novel graph-enhanced module (GraMLP) with parallel GCN and multi-layer perceptron is introduced to inject the skeletal knowledge of the human body into the final representation of 3D pose. Extensive experiments on two datasets (Human3.6M (Ionescu et al., 2013) and MPI-INF-3DHP (Mehta et al., 2017)) show the superior performance of our method in comparison to existing state-of-the-art (SOTA) models. The code and data are shared at <span><span>https://github.com/JihuaPeng/CFI-3DHPE</span><svg><path></path></svg></span></div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"189 ","pages":"Pages 175-181"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143335308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Zhao , Chenxiang Fan , Min Li , Zhonglong Zheng , Xiaoqin Zhang
{"title":"Global–local feature-mixed network with template update for visual tracking","authors":"Li Zhao , Chenxiang Fan , Min Li , Zhonglong Zheng , Xiaoqin Zhang","doi":"10.1016/j.patrec.2024.11.034","DOIUrl":"10.1016/j.patrec.2024.11.034","url":null,"abstract":"<div><div>Deep learning trackers have succeeded with a powerful local and global feature extraction capacity. However, both Siamese-based trackers with local convolution and Transformer-based trackers with global Transformer do not fully utilize frames. These trackers cannot obtain accurate tracking when they are faced with target appearance changes. This paper proposes a global–local features mixed tracker named GLT to complement the advantages of global and local frame features. GLT uses depth-wise convolution with dynamic weight to get local features and residual Transformer to get global features. Owing to global and local details, our method can perform accurate and robust tracking. Meanwhile, GLT has a template update strategy based on the key frame to face long-term tracking challenge. Numerous experiments show that our GLT achieves excellent performance on short-term and long-term benchmarks, including GOT-10k, TrackingNet and LaSOT. Furthermore, without many attention operations like other Transformer-based trackers, our GLT has fewer parameters and runs in real-time.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 111-116"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SP-Det: Anchor-based lane detection network with structural prior perception","authors":"Libo Sun, Hangyu Zhu, Wenhu Qin","doi":"10.1016/j.patrec.2024.11.030","DOIUrl":"10.1016/j.patrec.2024.11.030","url":null,"abstract":"<div><div>Effective perception and accurate localization of lane lines are the key points for intelligent vehicles to plan local driving paths and realize lane keeping and departure warning. However, the elongated structure of lane lines makes the performance of detectors degrade significantly when visual cues are scarce. The continuity of lane lines also puts forward higher requirements for the ability of algorithms to model long-range dependencies. In this paper, we propose a novel anchor-based lane detection network (SP-Det) combining the unique structural characteristics and pixel distribution of lane lines. Specifically, we introduce a Semantic-Guided Feature Calibration Unit (SG-FCU) to semantically calibrate and refine features from different layers and to narrow the semantic gap during fusion. Additionally, we propose a Spatial-aware Context Aggregation Block (S-CAB) and a Lane-aware Information Enhancement Module (LIEM) to improve the prediction accuracy of horizontal offsets of line anchors through global feature encoding and row-wise information sharing. The results of quantitative and qualitative experiments show that SP-Det achieves state-of-the-art performance on CULane and Tusimple benchmark datasets.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 60-66"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An asymmetric heuristic for trained ternary quantization based on the statistics of the weights: An application to medical signal classification","authors":"Yamil Vindas , Emmanuel Roux , Blaise Kévin Guépié , Marilys Almar , Philippe Delachartre","doi":"10.1016/j.patrec.2024.11.016","DOIUrl":"10.1016/j.patrec.2024.11.016","url":null,"abstract":"<div><div>One of the main challenges in the field of deep learning and embedded systems is the mismatch between the memory, computational and energy resources required by the former for good performance and the resource capabilities offered by the latter. It is therefore important to find a good trade-off between performance and computational resources used. In this study, we propose a novel ternarization heuristic based on the statistics of the weights, in addition to asymmetric pruning. Our approach involves the computation of two asymmetric thresholds based on the mean and standard deviation of the weights. This allows us to distinguish between positive and negative values prior to ternarization. Two hyperparameters are introduced into these thresholds, which permit the user to control the trade-off between compression and classification performance. Following thresholding, ternarization is carried out in accordance with the methodology of trained ternary quantization (TTQ). The efficacy of the method is evaluated on three datasets, two of which are medical: a cerebral emboli (HITS) dataset, an epileptic seizure recognition (ESR) dataset, and the MNIST dataset. Two types of deep learning models were tested: 2D convolutional neural networks (CNNs) and 1D CNN-transformers. The results demonstrate that our approach, aTTQ, achieves a superior trade-off between classification performance and compression rate compared with TTQ, for all the models and datasets. In fact, our method is capable of reducing the memory requirements of a 1D CNN-transformer model for the ESR dataset by over 21% compared to TTQ, while maintaining a Matthews correlation coefficient of 95%. The code is available at: <span><span>https://github.com/yamilvindas/aTTQ</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 37-45"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Navar Medeiros M. Nascimento , Pedro Cavalcante de Sousa Junior , Pedro Yuri Rodrigues Nunes , Suane Pires Pinheiro da Silva , Luiz Lannes Loureiro , Victor Zaban Bittencourt , Valden Luis Matos Capistrano Junior , Pedro Pedrosa Rebouças Filho
{"title":"New advances in body composition assessment with ShapedNet: A single image deep regression approach","authors":"Navar Medeiros M. Nascimento , Pedro Cavalcante de Sousa Junior , Pedro Yuri Rodrigues Nunes , Suane Pires Pinheiro da Silva , Luiz Lannes Loureiro , Victor Zaban Bittencourt , Valden Luis Matos Capistrano Junior , Pedro Pedrosa Rebouças Filho","doi":"10.1016/j.patrec.2024.11.029","DOIUrl":"10.1016/j.patrec.2024.11.029","url":null,"abstract":"<div><div>We introduce a novel technique called ShapedNet to enhance body composition assessment. This method employs a deep neural network capable of estimating Body Fat Percentage (BFP), performing individual identification, and enabling localization using a single photograph. The accuracy of ShapedNet is validated through comprehensive comparisons against the gold standard method, Dual-Energy X-ray Absorptiometry (DXA), utilizing 1273 healthy adults spanning various ages, sexes, and BFP levels. The results demonstrate that ShapedNet outperforms in 19.5% state of the art computer vision-based approaches for body fat estimation, achieving a Mean Absolute Percentage Error (MAPE) of 4.91% and Mean Absolute Error (MAE) of 1.42. The study evaluates both gender-based and Gender-neutral approaches, with the latter showcasing superior performance. The method estimates BFP with 95% confidence within an error margin of 4.01% to 5.81%. This research advances multi-task learning and body composition assessment theory through ShapedNet.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 88-94"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongkai Xia , Yang Lu , Xiaoheng Jiang , Mingliang Xu
{"title":"Enhanced multiscale attentional feature fusion model for defect detection on steel surfaces","authors":"Yongkai Xia , Yang Lu , Xiaoheng Jiang , Mingliang Xu","doi":"10.1016/j.patrec.2024.11.024","DOIUrl":"10.1016/j.patrec.2024.11.024","url":null,"abstract":"<div><div>Surface defect detection has been an important part of controlling the quality of industrial products. Detecting small defects poses a persistent challenge, primarily due to the scarcity of available information. To solve this problem, this paper proposes an improved model Bi-Level Efficient Global YOLO (BEG-YOLO) based on YOLOv8x. The architecture consists of Cross Stage Partial DenseNet Global Feature Pyramid Network (CSP-GFPN), and Bi-Level Efficient Attention (BEA). CSP-GFPN is a feature pyramid network structure that employs enhanced feature fusion across scales. It extends the neck network of BEG-YOLO while enabling information sharing across different spatial scales and potentially non-adjacent semantic layers. BEA is a fusion of attention. It serves to eliminate most of the irrelevant feature key–value pair inputs at the broader feature map level, thus focusing more comprehensively on the few relevant regions that remain. Three public datasets, NEU-DET, GC10-DET, and X-SDD, are used in this experiment. According to the experimental results, better results are achieved in terms of <span><math><mrow><mi>m</mi><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mn>50</mn></mrow></msub></mrow></math></span>, <span><math><mrow><mi>m</mi><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mn>50</mn><mo>−</mo><mn>95</mn></mrow></msub></mrow></math></span>, precision, and recall compared to other good YOLO series models. Code is available at <span><span>https://github.com/xyk1300332643/BEG-YOLO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 15-21"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yao Gu , Zhe Zheng , Yingna Wu, Guangping Xie, Na Ni
{"title":"Progressive self-supervised learning: A pre-training method for crowd counting","authors":"Yao Gu , Zhe Zheng , Yingna Wu, Guangping Xie, Na Ni","doi":"10.1016/j.patrec.2024.12.007","DOIUrl":"10.1016/j.patrec.2024.12.007","url":null,"abstract":"<div><div>Crowd counting technologies possess substantial social significance, and deep learning methods are increasingly seen as potent tools for advancing this field. Traditionally, many approaches have sought to enhance model performance by transferring knowledge from ImageNet, utilizing its classification weights to initialize models. However, the application of these pre-training weights is suboptimal for crowd counting, which involves dense prediction significantly different from image classification. To address these limitations, we introduce a progressive self-supervised learning approach, designed to generate more suitable pre-training weights from a large collection of density-related images. We gathered 173k images using custom-designed prompts and implemented a two-stage learning process to refine the feature representations of image patches with similar densities. In the first stage, mutual information between overlapping patches within the same image is maximized. Subsequently, a combination of global and local losses is evaluated to enhance feature similarity, with the latter assessing patches from different images of comparable densities. Our innovative pre-training approach demonstrated substantial improvements, reducing the Mean Absolute Error (MAE) by 7.5%, 17.6%, and 28.7% on the ShanghaiTech Part A & Part B and UCF_QNRF datasets respectively. Furthermore, when these pre-training weights were used to initialize existing models, such as CSRNet for density map regression and DM-Count for point supervision, a significant enhancement in performance was observed.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"188 ","pages":"Pages 148-154"},"PeriodicalIF":3.9,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143150837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fernando Alonso-Fernandez , Kevin Hernandez-Diaz , Jose Maria Buades Rubio , Prayag Tiwari , Josef Bigun
{"title":"Deep network pruning: A comparative study on CNNs in face recognition","authors":"Fernando Alonso-Fernandez , Kevin Hernandez-Diaz , Jose Maria Buades Rubio , Prayag Tiwari , Josef Bigun","doi":"10.1016/j.patrec.2025.01.023","DOIUrl":"10.1016/j.patrec.2025.01.023","url":null,"abstract":"<div><div>The widespread use of mobile devices for all kinds of transactions makes necessary reliable and real-time identity authentication, leading to the adoption of face recognition (FR) via the cameras embedded in such devices. Progress of deep Convolutional Neural Networks (CNNs) has provided substantial advances in FR. Nonetheless, the size of state-of-the-art architectures is unsuitable for mobile deployment, since they often encompass hundreds of megabytes and millions of parameters. We address this by studying methods for deep network compression applied to FR. In particular, we apply network pruning based on Taylor scores, where less important filters are removed iteratively. The method is tested on three networks based on the small SqueezeNet (1.24M parameters) and the popular MobileNetv2 (3.5M) and ResNet50 (23.5M) architectures. These have been selected to showcase the method on CNNs with different complexities and sizes. We observe that a substantial percentage of filters can be removed with minimal performance loss. Also, filters with the highest amount of output channels tend to be removed first, suggesting that high-dimensional spaces within popular CNNs are over-dimensioned. The models of this paper are available at <span><span>https://github.com/HalmstadUniversityBiometrics/CNN-pruning-for-face-recognition</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"189 ","pages":"Pages 221-228"},"PeriodicalIF":3.9,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}