Pattern Recognition Letters最新文献

筛选
英文 中文
Motion-guided small MAV detection in complex and non-planar scenes 复杂和非平面场景中的小型无人飞行器运动导航探测
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.013
{"title":"Motion-guided small MAV detection in complex and non-planar scenes","authors":"","doi":"10.1016/j.patrec.2024.09.013","DOIUrl":"10.1016/j.patrec.2024.09.013","url":null,"abstract":"<div><div>In recent years, there has been a growing interest in the visual detection of micro aerial vehicles (MAVs) due to its importance in numerous applications. However, the existing methods based on either appearance or motion features encounter difficulties when the background is complex or the MAV is too small. In this paper, we propose a novel motion-guided MAV detector that can accurately identify small MAVs in complex and non-planar scenes. This detector first exploits a motion feature enhancement module to capture the motion features of small MAVs. Then it uses multi-object tracking and trajectory filtering to eliminate false positives caused by motion parallax. Finally, an appearance-based classifier and an appearance-based detector that operates on the cropped regions are used to achieve precise detection results. Our proposed method can effectively and efficiently detect extremely small MAVs from dynamic and complex backgrounds because it aggregates pixel-level motion features and eliminates false positives based on the motion and appearance features of MAVs. Experiments on the ARD-MAV dataset demonstrate that the proposed method could achieve high performance in small MAV detection under challenging conditions and outperform other state-of-the-art methods across various metrics.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking unsupervised domain adaptation for semantic segmentation 反思语义分割的无监督领域适应性
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.022
{"title":"Rethinking unsupervised domain adaptation for semantic segmentation","authors":"","doi":"10.1016/j.patrec.2024.09.022","DOIUrl":"10.1016/j.patrec.2024.09.022","url":null,"abstract":"<div><div>Unsupervised domain adaptation (UDA) adapts a model trained on one domain (called source) to a novel domain (called target) using only unlabeled data. Due to its high annotation cost, researchers have developed many UDA methods for semantic segmentation, which assume no labeled sample is available in the target domain. We question the practicality of this assumption for two reasons. First, after training a model with a UDA method, we must somehow verify the model before deployment. Second, UDA methods have at least a few hyper-parameters that need to be determined. The surest solution to these is to evaluate the model using validation data, i.e., a certain amount of labeled target-domain samples. This question about the basic assumption of UDA leads us to rethink UDA from a data-centric point of view. Specifically, we assume we have access to a minimum level of labeled data. Then, we ask how much is necessary to find good hyper-parameters of existing UDA methods. We then consider what if we use the same data for supervised training of the same model, e.g., finetuning. We conducted experiments to answer these questions with popular scenarios, {GTA5, SYNTHIA}<span><math><mo>→</mo></math></span>Cityscapes. We found that i) choosing good hyper-parameters needs only a few labeled images for some UDA methods whereas a lot more for others; and ii) simple finetuning works surprisingly well; it outperforms many UDA methods if only several dozens of labeled images are available.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discrete diffusion models with Refined Language-Image Pre-trained representations for remote sensing image captioning 利用精炼语言-图像预训练表示的离散扩散模型为遥感图像添加标题
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.019
{"title":"Discrete diffusion models with Refined Language-Image Pre-trained representations for remote sensing image captioning","authors":"","doi":"10.1016/j.patrec.2024.09.019","DOIUrl":"10.1016/j.patrec.2024.09.019","url":null,"abstract":"<div><div>RS image captioning (RSIC) utilizes natural language to provide a description of image content, assisting in the comprehension of object properties and relationships. Nonetheless, RS images are characterized by variations in object scales, distributions, and quantities, which make it challenging to obtain global semantic information and object connections. To enhance the accuracy of captions produced from RS images, this paper proposes a novel method referred to as Discrete Diffusion Models with Refined Language-Image Pre-trained representations (DDM-RLIP), leveraging an advanced discrete diffusion model (DDM) for nosing and denoising text tokens. DDM-RLIP is based on an advanced DDM-based method designed for natural pictures. The primary approach for refining image representations involves fine-tuning a CLIP image encoder on RS images, followed by adapting the transformer with an additional attention module to focus on crucial image regions and relevant words. Furthermore, experiments were conducted on three datasets, Sydney-Captions, UCM-Captions, and NWPU-Captions, and the results demonstrated the superior performance of the proposed method compared to conventional autoregressive models. On the NWPU-Captions dataset, the CIDEr score improved from 116.4 to 197.7, further validating the efficacy and potential of DDM-RLIP. The implementation codes for our approach DDM-RLIP are available at <span><span>https://github.com/Leng-bingo/DDM-RLIP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personalized Federated Learning on long-tailed data via knowledge distillation and generated features 通过知识提炼和生成特征对长尾数据进行个性化联合学习
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.024
{"title":"Personalized Federated Learning on long-tailed data via knowledge distillation and generated features","authors":"","doi":"10.1016/j.patrec.2024.09.024","DOIUrl":"10.1016/j.patrec.2024.09.024","url":null,"abstract":"<div><div>Personalized Federated Learning (PFL) offers a novel paradigm for distributed learning, which aims to learn a personalized model for each client through collaborative training of all distributed clients in a privacy-preserving manner. However, the performance of personalized models is often compromised by data heterogeneity and the challenges of long-tailed distributions, both of which are common in real-world applications. In this paper, we explore the joint problem of data heterogeneity and long-tailed distribution in PFL and propose a corresponding solution called Personalized Federated Learning with Distillation and generated Features (PFLDF). Specifically, we employ a lightweight generator trained on the server to generate a balanced feature set for each client that can supplement local minority class information with global class information. This augmentation mechanism is a robust countermeasure against the adverse effects of data imbalance. Subsequently, we use knowledge distillation to transfer the knowledge of the global model to personalized models to improve their generalization performance. Extensive experimental results show the superiority of PFLDF compared to other state-of-the-art PFL methods with long-tailed data distribution.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A unified framework to stereotyped behavior detection for screening Autism Spectrum Disorder 用于筛查自闭症谱系障碍的刻板行为检测统一框架
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.10.001
{"title":"A unified framework to stereotyped behavior detection for screening Autism Spectrum Disorder","authors":"","doi":"10.1016/j.patrec.2024.10.001","DOIUrl":"10.1016/j.patrec.2024.10.001","url":null,"abstract":"<div><div>We propose a unified pipeline for the task of stereotyped behaviors detection for early diagnosis of Autism Spectrum Disorder (ASD). Current methods for analyzing autism-related behaviors of ASD children primarily focus on action classification tasks utilizing pre-trimmed video segments, limiting their real-world applicability. To overcome these challenges, we develop a two-stage network for detecting stereotyped behaviors: one for temporally localizing repetitive actions and another for classifying behavioral types. Specifically, building on the observation that stereotyped behaviors commonly manifest in various repetitive forms, our method proposes an approach to localize video segments where arbitrary repetitive behaviors are observed. Subsequently, we classify the detailed types of behaviors within these localized segments, identifying actions such as arm flapping, head banging, and spinning. Extensive experimental results on SSBD and ESBD datasets demonstrate that our proposed pipeline surpasses existing baseline methods, achieving a classification accuracy of 88.3% and 88.6%, respectively. The code and dataset will be publicly available at <span><span>https://github.com/etri/AI4ASD/tree/main/pbr4RRB</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A complex neural network model by Hilbert Transform 利用希尔伯特变换的复杂神经网络模型
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2024-10-01 DOI: 10.1016/j.patrec.2024.09.021
{"title":"A complex neural network model by Hilbert Transform","authors":"","doi":"10.1016/j.patrec.2024.09.021","DOIUrl":"10.1016/j.patrec.2024.09.021","url":null,"abstract":"<div><div>The phase information of the optical wave plays a vital role in processing wave-related signals. In deep learning fields, complex-valued neural networks are put forward on the concept of complex amplitudes for full utilization of phase information. To build a complex-valued neural network, the common way is to exploit Fourier Transform of the observed signal to extract amplitude and phase information. However, this will lead to spectrum waste for a real-valued signal by introducing negative frequencies that have no physical meaning. To this end, we attempt to use Hilbert Transform as an alternative to yield a single sideband spectrum and avoid negative frequencies from interacting with positive ones. On the other hand, Fourier transform is a global analysis thus it tells nothing about the time domain. As our key insight, we further explore the usage of instantaneous frequency calculated by Hilbert Transform and propose a new method of constructing complex input from a time–frequency angle. Simple pixel-wise classification experiments are carried out on two hyperspectral datasets and MNIST dataset. Experimental results have demonstrated that Hilbert Transform with instantaneous frequency performs better by a large margin than Fourier Transform owing to the additional time information.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty quantification metrics for deep regression 深度回归的不确定性量化指标
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2024-09-19 DOI: 10.1016/j.patrec.2024.09.011
{"title":"Uncertainty quantification metrics for deep regression","authors":"","doi":"10.1016/j.patrec.2024.09.011","DOIUrl":"10.1016/j.patrec.2024.09.011","url":null,"abstract":"<div><div>When deploying deep neural networks on robots or other physical systems, the learned model should reliably quantify predictive uncertainty. A reliable uncertainty allows downstream modules to reason about the safety of its actions. In this work, we address metrics for uncertainty quantification. Specifically, we focus on regression tasks, and investigate Area Under Sparsification Error (AUSE), Calibration Error (CE), Spearman’s Rank Correlation, and Negative Log-Likelihood (NLL). Using multiple datasets, we look into how those metrics behave under four typical types of uncertainty, their stability regarding the size of the test set, and reveal their strengths and weaknesses. Our results indicate that Calibration Error is the most stable and interpretable metric, but AUSE and NLL also have their respective use cases. We discourage the usage of Spearman’s Rank Correlation for evaluating uncertainties and recommend replacing it with AUSE.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142319421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CvFormer: Cross-view transFormers with pre-training for fMRI analysis of human brain CvFormer:用于人脑 fMRI 分析的带预培训的交叉视图转换器
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2024-09-17 DOI: 10.1016/j.patrec.2024.09.010
{"title":"CvFormer: Cross-view transFormers with pre-training for fMRI analysis of human brain","authors":"","doi":"10.1016/j.patrec.2024.09.010","DOIUrl":"10.1016/j.patrec.2024.09.010","url":null,"abstract":"<div><p>In recent years, functional magnetic resonance imaging (fMRI) has been widely utilized to diagnose neurological disease, by exploiting the region of interest (RoI) nodes as well as their connectivities in human brain. However, most of existing works only rely on either RoIs or connectivities, neglecting the potential for complementary information between them. To address this issue, we study how to discover the rich cross-view information in fMRI data of human brain. This paper presents a novel method for cross-view analysis of fMRI data of the human brain, called Cross-view transFormers (CvFormer). CvFormer employs RoI and connectivity encoder modules to generate two separate views of the human brain, represented as RoI and sub-connectivity tokens. Then, basic transformer modules can be used to process the RoI and sub-connectivity tokens, and cross-view modules integrate the complement information across two views. Furthermore, CvFormer uses a global token for each branch as a query to exchange information with other branches in cross-view modules, which only requires linear time for both computational and memory complexity instead of quadratic time. To enhance the robustness of the proposed CvFormer, we propose a two-stage strategy to train its parameters. To be specific, RoI and connectivity views can be firstly utilized as self-supervised information to pre-train the CvFormer by combining it with contrastive learning and then fused to finetune the CvFormer using label information. Experiment results on two public ABIDE and ADNI datasets can show clear improvements by the proposed CvFormer, which can validate its effectiveness and superiority.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142272723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DRGNN: Disentangled representation graph neural network for diverse category-level recommendations DRGNN:用于不同类别推荐的分离表示图神经网络
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2024-09-16 DOI: 10.1016/j.patrec.2024.09.008
{"title":"DRGNN: Disentangled representation graph neural network for diverse category-level recommendations","authors":"","doi":"10.1016/j.patrec.2024.09.008","DOIUrl":"10.1016/j.patrec.2024.09.008","url":null,"abstract":"<div><p>Graph neural networks (GNNs) have significantly advanced recommender systems (RecSys) by enhancing their accuracy in complex collaborative filtering scenarios. However, this progress often comes at the cost of overlooking the diversity of recommendations, a factor in user satisfaction. Addressing this gap, this paper introduces the disentangled representation graph neural network (DRGNN). DRGNN integrates diversification into the candidate generation stage using two specialized modules. The first employs disentangled representation learning to separate item preferences from category preferences, thereby mitigating category bias in recommendations. The second module, focusing on positive sample selection, further reduces category bias. This approach not only maintains the high-order connectivity strengths of GNNs but also substantially improves the diversity of recommendations. Our extensive validation of DRGNN on three comprehensive web service datasets, Taobao, Amazon Beauty and MSD, shows that it not only matches the state-of-the-art methods in accuracy but also excels in achieving a balanced trade-off between accuracy and diversity in recommendations.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142242542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hierarchical Glocal Attention Pooling for Graph Classification 用于图谱分类的分层局部注意力汇集
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2024-09-13 DOI: 10.1016/j.patrec.2024.09.009
{"title":"Hierarchical Glocal Attention Pooling for Graph Classification","authors":"","doi":"10.1016/j.patrec.2024.09.009","DOIUrl":"10.1016/j.patrec.2024.09.009","url":null,"abstract":"<div><p>Graph pooling is an essential operation in Graph Neural Networks that reduces the size of an input graph while preserving its core structural properties. Existing pooling methods find a compressed representation considering the Global Topological Structures (e.g., cliques, stars, clusters) or Local information at node level (e.g., top-<span><math><mi>k</mi></math></span> informative nodes). However, an effective graph pooling method does not hierarchically integrate both Global and Local graph properties. To this end, we propose a dual-fold Hierarchical Global Local Attention Pooling (HGLA-Pool) layer that exploits the aforementioned graph properties, generating more robust graph representations. Exhaustive experiments on nine publicly available graph classification benchmarks under standard metrics show that HGLA-Pool significantly outperforms eleven state-of-the-art models on seven datasets while being on par for the remaining two.</p></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142242541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信