Pattern Recognition Letters最新文献

筛选
英文 中文
Hierarchical memory-enhanced networks for student knowledge tracing 学生知识追踪的分层记忆增强网络
IF 3.3 3区 计算机科学
Pattern Recognition Letters Pub Date : 2026-03-01 Epub Date: 2026-01-03 DOI: 10.1016/j.patrec.2026.01.002
Huali Yang , Junjie Hu , Tao Huang , Shengze Hu , Wang Gao , Zhuoran Xu , Jing Geng
{"title":"Hierarchical memory-enhanced networks for student knowledge tracing","authors":"Huali Yang ,&nbsp;Junjie Hu ,&nbsp;Tao Huang ,&nbsp;Shengze Hu ,&nbsp;Wang Gao ,&nbsp;Zhuoran Xu ,&nbsp;Jing Geng","doi":"10.1016/j.patrec.2026.01.002","DOIUrl":"10.1016/j.patrec.2026.01.002","url":null,"abstract":"<div><div>Accurate recognition of students’ knowledge states is critical for personalized education in the field of intelligent education. Knowledge tracing (KT) has emerged as an important research domain for tracing students’ knowledge states using the analysis of learning trajectory data. However, existing KT methods tend to overlook the hierarchical nature of memory, resulting in incomplete memory transfer. To address this issue, this study proposes a novel hierarchical memory-enhanced knowledge tracing (HMEKT) method that models the hierarchical structure of memory. HMEKT consists of three modules: shallow memory, deep memory, and performance prediction. Specifically, in the shallow memory module, learning and forgetting mechanisms are used to simulate memory growth and decay, capturing the dynamic changes in knowledge states. In the deep memory module, a dynamic memory matrix is used to store the student’s core knowledge system, transferring shallow memory into deep memory through enhancement and reduction gates that control memory transfer. Finally, for predicting student performance, relevant knowledge states are aggregated from the knowledge system matrix for future questions. Experiments on four datasets demonstrate the effectiveness of the model, with a 1.99% AUC gain on Assistment2017 compared to state-of-the-art methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 37-44"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AMDC: Attenuation map-guided dual-color space for underwater image color correction 用于水下图像色彩校正的衰减地图引导双色空间
IF 3.3 3区 计算机科学
Pattern Recognition Letters Pub Date : 2026-03-01 Epub Date: 2026-01-13 DOI: 10.1016/j.patrec.2026.01.005
Shilong Sun , Baiqiang Yu , Ling Zhou , Junpeng Xu , Wenyi Zhao , Weidong Zhang
{"title":"AMDC: Attenuation map-guided dual-color space for underwater image color correction","authors":"Shilong Sun ,&nbsp;Baiqiang Yu ,&nbsp;Ling Zhou ,&nbsp;Junpeng Xu ,&nbsp;Wenyi Zhao ,&nbsp;Weidong Zhang","doi":"10.1016/j.patrec.2026.01.005","DOIUrl":"10.1016/j.patrec.2026.01.005","url":null,"abstract":"<div><div>Underwater images frequently exhibit color distortions due to wavelength-dependent light attenuation and absorption, further complicated by irregular lighting conditions underwater. Traditional color correction methods primarily target global light attenuation but are less effective in handling local color shifts caused by discontinuous depth variations and artificial illumination. To address this issue, we propose a dual-space adaptive color correction method guided by an attenuation map, referred to as AMDC. Specifically, we first utilize global attenuation compensation by leveraging the maximum reference channel of the image. Building on the globally compensated result, we then introduce a dual-space collaborative correction strategy. In RGB space, we perform local adaptive compensation using a weighted sliding window. In CIELab space, we restore color saturation through a zero-symmetric adaptive offset correction approach. To retain the most visually optimal color features, we selectively fuse the a and b channels from the two correction results, producing a locally corrected image. Finally, we utilize the maximum attenuation map of the raw image to guide the fusion of the locally corrected image with the raw, generating the final color-corrected output. Extensive qualitative and quantitative experiments demonstrate the effectiveness and robustness of our method for underwater image color correction.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 80-86"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Entropy calibrated prototype embedding for transductive few-shot learning 基于熵校正的换能化短时学习原型嵌入
IF 3.3 3区 计算机科学
Pattern Recognition Letters Pub Date : 2026-03-01 Epub Date: 2026-01-13 DOI: 10.1016/j.patrec.2026.01.015
Mengfei Guo , Jiahui Wang , Qin Xu , Bo Jiang , Bin Luo
{"title":"Entropy calibrated prototype embedding for transductive few-shot learning","authors":"Mengfei Guo ,&nbsp;Jiahui Wang ,&nbsp;Qin Xu ,&nbsp;Bo Jiang ,&nbsp;Bin Luo","doi":"10.1016/j.patrec.2026.01.015","DOIUrl":"10.1016/j.patrec.2026.01.015","url":null,"abstract":"<div><div>Transductive Few-shot learning aims to generalize to new classes from limited labeled support and all unlabeled query samples. Widely adopted paradigms including prototypical networks and graph-based label propagation. The former classifying queries based on distances to class prototypes, while the latter propagates the labels based on support samples. However, Existing methods typically treat all samples with equal importance, neglect inherent reliabilities, and underutilize prototypes merely as static anchors. This paper proposes Entropy Calibrated Prototype Embedding (ECPE), a novel framework that not only integrates prototypical networks and label propagation methods but also addresses their respective limitations through an iterative refinement strategy. Firstly, we propose the Entropy Calibration (EC), which dynamically assesses sample reliability using prediction entropy to weigh their influence in label propagation. Secondly, Entropy-aware Prototype Embedding (EPE) we proposed treats prototypes as evolving synthetic nodes, iteratively updating them based on calibrated predictions and embedding high-certainty prototypes into the graph.With the iteration of label calibration, entropy-aware prototype embedding, and label propagation, the proposed ECPE enhances classification accuracy and robustness. Extensive experiments demonstrate that ECPE surpasses state-of-the-art performance on three standard Transductive FSL benchmarks. Our source code is published at: <span><span>https://github.com/gmf-ahu/ECPE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 138-144"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compressing model with few class-imbalance samples: An out-of-distribution expedition 类不平衡样本较少的压缩模型:一次分布外考察
IF 3.3 3区 计算机科学
Pattern Recognition Letters Pub Date : 2026-03-01 Epub Date: 2026-01-13 DOI: 10.1016/j.patrec.2026.01.010
Tian-Shuang Wu , Shen-Huan Lyu , Yanyan Wang , Ning Chen , Zhihao Qu , Baoliu Ye
{"title":"Compressing model with few class-imbalance samples: An out-of-distribution expedition","authors":"Tian-Shuang Wu ,&nbsp;Shen-Huan Lyu ,&nbsp;Yanyan Wang ,&nbsp;Ning Chen ,&nbsp;Zhihao Qu ,&nbsp;Baoliu Ye","doi":"10.1016/j.patrec.2026.01.010","DOIUrl":"10.1016/j.patrec.2026.01.010","url":null,"abstract":"<div><div>Few-sample model compression aims to compress a large pre-trained model into a compact one using only a few samples. However, previous methods typically assume a balanced class distribution, which is costly under severe data scarcity. In the presence of imbalance, the compressed model exhibits significant performance degradation. We propose a novel framework named OOD-Enhanced Few-Sample Model Compression (OE-FSMC), introducing out-of-distribution (OOD) samples with dynamically assigned labels to prevent bias during the compression process. To avoid overfitting the OOD samples, we incorporate a joint distillation loss and a class-dependent regularization term. Extensive experiments on multiple benchmark datasets show that our framework can be seamlessly incorporated into existing few-sample model compression methods, effectively mitigating the accuracy degradation caused by class imbalance.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 117-124"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MER-CAPF: Audio-text emotion recognition through cross-attention mechanism and multi-granularity pooling strategy MER-CAPF:基于交叉注意机制和多粒度池化策略的声文情感识别
IF 3.3 3区 计算机科学
Pattern Recognition Letters Pub Date : 2026-03-01 Epub Date: 2026-01-13 DOI: 10.1016/j.patrec.2026.01.008
Chengming Chen, Pengyuan Liu, Zhicheng Dong, Zhuo He, Zhijian Li
{"title":"MER-CAPF: Audio-text emotion recognition through cross-attention mechanism and multi-granularity pooling strategy","authors":"Chengming Chen,&nbsp;Pengyuan Liu,&nbsp;Zhicheng Dong,&nbsp;Zhuo He,&nbsp;Zhijian Li","doi":"10.1016/j.patrec.2026.01.008","DOIUrl":"10.1016/j.patrec.2026.01.008","url":null,"abstract":"<div><div>In the field of Human–Computer Interaction (HCI), emotion recognition is regarded as a critical yet challenging task due to its multimodal nature and limitations in data acquisition. To achieve accurate recognition of multimodal emotional information such as speech and text, this paper proposes a novel multimodal emotion recognition framework, MER-CAPF (Multimodal Emotion Recognition with Cross-Attention and Pooling Fusion). The framework employs a hierarchically frozen BERT model and a depthwise separable convolutional neural network (DSCNN) combined with a Bi-LSTM to extract features from the text and audio modalities, respectively. During the feature fusion stage, a multi-head cross-attention mechanism and multi-granularity pooling strategy are introduced to fully capture semantic and acoustic associations across modalities. In addition, the model incorporates parallel modality encoders with a progressive modality alignment mechanism to achieve synergistic alignment and deep interaction between speech and text features. Experiments conducted on three public benchmark datasets—IEMOCAP, MELD, and CMU-MOSEI demonstrate that MER-CAPF achieves accuracies of 74.73%, 63.26% and 67.38% on the IEMOCAP, MELD and CMU-MOSEI respectively, outperforming most existing methods and reaching a level comparable to recent state-of-the-art models, thereby validating the effectiveness and robustness of the proposed framework.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 125-131"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wavelet-based diffusion transformer for image dehazing 基于小波的图像去雾扩散变换器
IF 3.3 3区 计算机科学
Pattern Recognition Letters Pub Date : 2026-03-01 Epub Date: 2026-01-10 DOI: 10.1016/j.patrec.2026.01.016
Cheng Ma , Guojun Liu , Jing Yue
{"title":"Wavelet-based diffusion transformer for image dehazing","authors":"Cheng Ma ,&nbsp;Guojun Liu ,&nbsp;Jing Yue","doi":"10.1016/j.patrec.2026.01.016","DOIUrl":"10.1016/j.patrec.2026.01.016","url":null,"abstract":"<div><div>In current image dehazing methods based on diffusion models, few studies explore and leverage the inherent prior knowledge of hazy images. Additionally, the inherent complexity of these models often results in difficulties during training, which in turn lead to poor restoration performance in dense hazy environments. To address these challenges, this paper proposes a dehazing diffusion model based on Haar wavelet priors, aiming to fully exploit the characteristic that haze information is concentrated in the low-frequency region. Specifically, the Haar wavelet transform is first applied to decompose the hazy image, and the diffusion model is used to generate low-frequency information in the image, thereby reconstructing the main colors and content of the dehazed image. Moreover, a high-frequency enhancement module based on Gabor is designed to extract high-frequency details through multi-directional Gabor convolution filters, further improving the fine-grained restoration capability of the image. Subsequently, a multi-scale pooling block is adopted to reduce blocky artifacts caused by non-uniform haze conditions, enhancing the visual consistency of the image. Finally, the effectiveness of the proposed method is demonstrated on publicly available datasets, and the model’s generalization ability is tested on real hazy image datasets, as well as its potential for application in other downstream tasks. The code is available at <span><span>https://github.com/Mccc1003/WDiT_Dehaze-main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 58-65"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MR-DETR: Miss reduction DETR with context frequency attention and adaptive query allocation strategy for small object detection MR-DETR:基于上下文频率关注和自适应查询分配策略的小目标检测的缺失减少DETR
IF 3.3 3区 计算机科学
Pattern Recognition Letters Pub Date : 2026-03-01 Epub Date: 2026-01-09 DOI: 10.1016/j.patrec.2026.01.004
Hailan Shen, Zihan Wang, Shuo Huang, Zailiang Chen
{"title":"MR-DETR: Miss reduction DETR with context frequency attention and adaptive query allocation strategy for small object detection","authors":"Hailan Shen,&nbsp;Zihan Wang,&nbsp;Shuo Huang,&nbsp;Zailiang Chen","doi":"10.1016/j.patrec.2026.01.004","DOIUrl":"10.1016/j.patrec.2026.01.004","url":null,"abstract":"<div><div>Small object detection is a critical task in computer vision, aiming to accurately detect tiny instances within images. Although DETR-based methods have improved general object detection, they often suffer from missed detections of small objects due to their limited size and indistinct features. Moreover, DETR-based methods employ a fixed number of queries, making it difficult to adapt to the dynamic variations of scenes. In this study, we propose Miss Reduction DETR (MR-DETR), which leverages Context Frequency Attention (CFA) and an Adaptive Query Allocation Strategy (AQAS) to reduce missed detections. First, to better capture fine details of small objects, CFA is designed with two complementary branches known as context and frequency. The former branch employs axial strip convolutions to capture global contextual information, while the latter branch uses a frequency modulation module to emphasize local high-frequency details. Next, AQAS is introducted, which applies feature excitation and compression to the encoder’s output maps, dynamically evaluates object density, and automatically adjusts the number of queries based on a density-to-query mapping, thereby improving adaptability in complex scenes and reducing missed detections. Experimental results demonstrate that MR-DETR achieves state-of-the-art detection performance on the aerial image datasets VisDrone and AI-TOD, which mainly contain small objects.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 52-57"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145979043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Audio prompt driven reprogramming for diagnosing major depressive disorder 音频提示驱动的重编程诊断重度抑郁症
IF 3.3 3区 计算机科学
Pattern Recognition Letters Pub Date : 2026-03-01 Epub Date: 2025-12-24 DOI: 10.1016/j.patrec.2025.12.007
Hyunseo Kim, Longbin Jin, Eun Yi Kim
{"title":"Audio prompt driven reprogramming for diagnosing major depressive disorder","authors":"Hyunseo Kim,&nbsp;Longbin Jin,&nbsp;Eun Yi Kim","doi":"10.1016/j.patrec.2025.12.007","DOIUrl":"10.1016/j.patrec.2025.12.007","url":null,"abstract":"<div><div>Diagnosing depression is critical due to its profound impact on individuals and associated risks. Although deep learning techniques like convolutional neural networks and transformers have been employed to detect depression, they require large, labeled datasets and substantial computational resources, posing challenges in data-scarce environments. We introduce p-DREAM (Prompt-Driven Reprogramming Exploiting Audio Mapping), a novel and data-efficient model designed to diagnose depression from speech data alone. The p-DREAM combines two main strategies: data augmentation and model reprogramming. First, it utilizes audio-specific data augmentation techniques to generate a richer set of training examples. Next, it employs audio prompts to aid in domain adaptation. These prompts guide a frozen pre-trained transformer, which extracts meaningful features. Finally, these features are fed into a lightweight classifier for prediction. The p-DREAM outperforms traditional fine-tuning and linear probing methods, while requiring only a small number of trainable parameters. Evaluations on three benchmark datasets (DAIC-WoZ, E-DAIC, and AVEC 2014) demonstrate consistent improvements. In particular, p-DREAM achieves a leading macro F1 score of 0.7734 using only acoustic features. We further conducted ablation studies on prompt length, position, and initialization, confirming their importance in effective model adaptation. p-DREAM offers a practical and privacy-conscious approach for speech-based depression assessment in low-resource environments. To promote reproducibility and community adoption, we plan to release our codebase in compliance with the ethical protocols outlined in the AVEC challenges.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 1-8"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine-tuning ImageNet-pretrained models in medical image classification: Reassessing the impact of different factors 微调imagenet预训练模型在医学图像分类中的应用:重新评估不同因素的影响
IF 3.3 3区 计算机科学
Pattern Recognition Letters Pub Date : 2026-03-01 Epub Date: 2026-01-16 DOI: 10.1016/j.patrec.2026.01.017
Juan Miguel Valverde , Vandad Imani , Jussi Tohka
{"title":"Fine-tuning ImageNet-pretrained models in medical image classification: Reassessing the impact of different factors","authors":"Juan Miguel Valverde ,&nbsp;Vandad Imani ,&nbsp;Jussi Tohka","doi":"10.1016/j.patrec.2026.01.017","DOIUrl":"10.1016/j.patrec.2026.01.017","url":null,"abstract":"<div><div>Fine-tuning ImageNet-pretrained convolutional neural networks is a widely used strategy in medical image classification. Previous studies investigating the benefits of ImageNet pretraining over training from scratch have resulted in conflicting findings, likely due to lack of standardization in the experiments. Here, we identify various factors that were previously overlooked, and we propose a set of standardized experiments that account for these factors and that contribute to clarifying whether pretraining on ImageNet is truly advantageous. Our experiments revealed that dataset-independent factors (training set size, training time, and model size) cannot predict whether ImageNet pretraining will be beneficial. This is because the benefits of ImageNet pretraining depend on other, dataset and implementation specific, factors such as task difficulty and model architecture. We conclude that past demonstrations of the effectiveness of ImageNet pretraining are not universal, and that the potential advantages of ImageNet pretraining should be empirically evaluated in each scenario separately.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 132-137"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequency-selective countnet: Enhancing text-guided object counting with frequency features 频率选择计数:通过频率特征增强文本引导的对象计数
IF 3.3 3区 计算机科学
Pattern Recognition Letters Pub Date : 2026-03-01 Epub Date: 2025-12-27 DOI: 10.1016/j.patrec.2025.12.014
Cheng Qian , Jiwu Cao , Ying Mao , Ruotian Zhang , Fei Long , Jun Sang
{"title":"Frequency-selective countnet: Enhancing text-guided object counting with frequency features","authors":"Cheng Qian ,&nbsp;Jiwu Cao ,&nbsp;Ying Mao ,&nbsp;Ruotian Zhang ,&nbsp;Fei Long ,&nbsp;Jun Sang","doi":"10.1016/j.patrec.2025.12.014","DOIUrl":"10.1016/j.patrec.2025.12.014","url":null,"abstract":"<div><div>Text-guided object counting aims to estimate the number of objects described by natural language within complex visual scenes. However, existing approaches often struggle to align textual intent with diverse visual patterns, especially when target objects vary in scale, appearance, or context.</div><div>To address these limitations, we propose Frequency-Selective CountNet (FSCNet), a novel framework that integrates spatial and frequency-domain features for precise text-guided counting. FSCNet introduces a Triple-Stream Attention Fusion Module (TSAFM) that combines textual, global, and local visual features. Additionally, an Adaptive Frequency Selector (AFS) dynamically emphasizes frequency components by separately modulating the magnitude and phase spectra, preserving geometric consistency during decoding.</div><div>Extensive experiments on the FSC-147 and CARPK datasets demonstrate that FSCNet achieves state-of-the-art performance, outperforming previous best methods by 18.34% in MAE and 27.41% in RMSE on FSC-147 (Avg.) and by 5.17%/7.58% on CARPK.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 15-21"},"PeriodicalIF":3.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145940474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书