Computer Vision and Image Understanding最新文献

筛选
英文 中文
DA2: Distribution-agnostic adaptive feature adaptation for one-class classification DA2:单类分类的分布不可知自适应特征自适应
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104256
Zilong Zhang, Zhibin Zhao, Xingwu Zhang, Xuefeng Chen
{"title":"DA2: Distribution-agnostic adaptive feature adaptation for one-class classification","authors":"Zilong Zhang,&nbsp;Zhibin Zhao,&nbsp;Xingwu Zhang,&nbsp;Xuefeng Chen","doi":"10.1016/j.cviu.2024.104256","DOIUrl":"10.1016/j.cviu.2024.104256","url":null,"abstract":"<div><div>One-class classification (OCC), i.e., identifying whether an example belongs to the same distribution as the training data, is essential for deploying machine learning models in the real world. Adapting the pre-trained features on the target dataset has proven to be a promising paradigm for improving OCC performance. Existing methods are constrained by assumptions about the training distribution. This contradicts the real scenario where the data distribution is unknown. In this work, we propose a simple <strong>d</strong>istribution-<strong>a</strong>gnostic <strong>a</strong>daptive feature adaptation method (<span><math><msup><mrow><mi>DA</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>). The core idea is to adaptively cluster the features of every class tighter depending on the property of the data. We rely on the prior that the augmentation distributions of intra-class samples overlap, then align the features of different augmentations of every sample by a non-contrastive method. We find that training a random initialized predictor degrades the pre-trained backbone in the non-contrastive method. To tackle this problem, we design a learnable symmetric predictor and initialize it based on the eigenspace alignment theory. Benchmarks, the proposed challenging near-distribution experiments substantiate the capability of our method in various data distributions. Furthermore, we find that utilizing <span><math><msup><mrow><mi>DA</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> can immensely mitigate the long-standing catastrophic forgetting in feature adaptation of OCC. Code will be released upon acceptance.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104256"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-Temporal Dynamic Interlaced Network for 3D human pose estimation in video 视频中三维人体姿态估计的时空动态交错网络
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104258
Feiyi Xu , Jifan Wang , Ying Sun , Jin Qi , Zhenjiang Dong , Yanfei Sun
{"title":"Spatio-Temporal Dynamic Interlaced Network for 3D human pose estimation in video","authors":"Feiyi Xu ,&nbsp;Jifan Wang ,&nbsp;Ying Sun ,&nbsp;Jin Qi ,&nbsp;Zhenjiang Dong ,&nbsp;Yanfei Sun","doi":"10.1016/j.cviu.2024.104258","DOIUrl":"10.1016/j.cviu.2024.104258","url":null,"abstract":"<div><div>Recent transformer-based methods have achieved excellent performance in 3D human pose estimation. The distinguishing characteristic of transformer lies in its equitable treatment of each token, encoding them independently. When applied to the human skeleton, transformer regards each joint as an equally significant token. This can lead to a lack of clarity in the extraction of connection relationships between joints, thus affecting the accuracy of relationship information. In addition, transformer also treats each frame of temporal sequences equally. This design can introduce a lot of redundant information in short frames with frequent action changes, which can have a negative impact on learning temporal correlations. To alleviate the above issues, we propose an end-to-end framework, a Spatio-Temporal Dynamic Interlaced Network (S-TDINet), including a dynamic spatial GCN encoder (DSGCE) and an interlaced temporal transformer encoder (ITTE). In the DSGCE module, we design three adaptive adjacency matrices to model spatial correlation from static and dynamic perspectives. In the ITTE module, we introduce a global–local interlaced mechanism to mitigate potential interference from redundant information in fast motion scenarios, thereby achieving more accurate temporal correlation modeling. Finally, we conduct extensive experiments and validate the effectiveness of our approach on two widely recognized benchmark datasets: Human3.6M and MPI-INF-3DHP.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104258"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gaussian Splatting with NeRF-based color and opacity 高斯飞溅与nerf为基础的颜色和不透明度
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104273
Dawid Malarz , Weronika Smolak-Dyżewska , Jacek Tabor , Sławomir Tadeja , Przemysław Spurek
{"title":"Gaussian Splatting with NeRF-based color and opacity","authors":"Dawid Malarz ,&nbsp;Weronika Smolak-Dyżewska ,&nbsp;Jacek Tabor ,&nbsp;Sławomir Tadeja ,&nbsp;Przemysław Spurek","doi":"10.1016/j.cviu.2024.104273","DOIUrl":"10.1016/j.cviu.2024.104273","url":null,"abstract":"<div><div>Neural Radiance Fields (NeRFs) have demonstrated the remarkable potential of neural networks to capture the intricacies of 3D objects. NeRFs excel at producing strikingly sharp novel views of 3D objects by encoding the shape and color information within neural network weights. Recently, numerous generalizations of NeRFs utilizing generative models have emerged, expanding their versatility. In contrast, <em>Gaussian Splatting</em> (GS) offers a similar render quality with faster training and inference as it does not need neural networks to work. It encodes information about the 3D objects in the set of Gaussian distributions that can be rendered in 3D similarly to classical meshes. Unfortunately, GS is difficult to condition since its representation is fully explicit. To mitigate the caveats of both models, we propose a hybrid model <em>Viewing Direction Gaussian Splatting</em> (VDGS) that uses GS representation of the 3D object’s shape and NeRF-based encoding of opacity. Our model uses Gaussian distributions with trainable positions (i.e., means of Gaussian), shape (i.e., the covariance of Gaussian), opacity, and a neural network that takes Gaussian parameters and viewing direction to produce changes in the said opacity.As a result, our model better describes shadows, light reflections, and the transparency of 3D objects without adding additional texture and light components.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104273"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
As-Global-As-Possible stereo matching with Sparse Depth Measurement Fusion 稀疏深度测量融合的尽可能全局立体匹配
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104268
Peng Yao , Haiwei Sang
{"title":"As-Global-As-Possible stereo matching with Sparse Depth Measurement Fusion","authors":"Peng Yao ,&nbsp;Haiwei Sang","doi":"10.1016/j.cviu.2024.104268","DOIUrl":"10.1016/j.cviu.2024.104268","url":null,"abstract":"<div><div>The recently lauded methodologies of As-Global-As-Possible (AGAP) and Sparse Depth Measurement Fusion (SDMF) have emerged as celebrated solutions for tackling the issue of stereo matching. AGAP addresses the congenital shortcomings of Semi-Global-Matching (SGM) in terms of streaking effects, while SDMF leverages active depth sensors to boost disparity computation. In this paper, these two methods are intertwined for attaining superior disparity estimation. Random sparse Depth measurements are fused with Diffusion-Based Fusion to update AGAP’s matching costs. Then, Neighborhood-Based Fusion refines the cost further, leveraging the previous results. Ultimately, the segment-based disparity refinement strategy is utilized for handling outliers and mismatched pixels to achieve final disparity results. Performance evaluations on various stereo datasets demonstrate that the proposed algorithm not only surpasses other challenging stereo matching algorithms but also achieves near real-time efficiency. It is worth pointing out that our proposal surprisingly outperforms most of the deep learning based stereo matching algorithms on Middlebury <em>v.3</em> online evaluation system, despite not utilizing any learning-based techniques, further validating its superiority and practicality.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104268"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to mask and permute visual tokens for Vision Transformer pre-training 学习为视觉变形器预训练遮罩和排列视觉标记
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2025.104294
Lorenzo Baraldi , Roberto Amoroso , Marcella Cornia , Lorenzo Baraldi , Andrea Pilzer , Rita Cucchiara
{"title":"Learning to mask and permute visual tokens for Vision Transformer pre-training","authors":"Lorenzo Baraldi ,&nbsp;Roberto Amoroso ,&nbsp;Marcella Cornia ,&nbsp;Lorenzo Baraldi ,&nbsp;Andrea Pilzer ,&nbsp;Rita Cucchiara","doi":"10.1016/j.cviu.2025.104294","DOIUrl":"10.1016/j.cviu.2025.104294","url":null,"abstract":"<div><div>The use of self-supervised pre-training has emerged as a promising approach to enhance the performance of many different visual tasks. In this context, recent approaches have employed the Masked Image Modeling paradigm, which pre-trains a backbone by reconstructing visual tokens associated with randomly masked image patches. This masking approach, however, introduces noise into the input data during pre-training, leading to discrepancies that can impair performance during the fine-tuning phase. Furthermore, input masking neglects the dependencies between corrupted patches, increasing the inconsistencies observed in downstream fine-tuning tasks. To overcome these issues, we propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT), that employs autoregressive and permuted predictions to capture intra-patch dependencies. In addition, MaPeT employs auxiliary positional information to reduce the disparity between the pre-training and fine-tuning phases. In our experiments, we employ a fair setting to ensure reliable and meaningful comparisons and conduct investigations on multiple visual tokenizers, including our proposed <span><math><mi>k</mi></math></span>-CLIP which directly employs discretized CLIP features. Our results demonstrate that MaPeT achieves competitive performance on ImageNet, compared to baselines and competitors under the same model setting. We release an implementation of our code and models at <span><span>https://github.com/aimagelab/MaPeT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"252 ","pages":"Article 104294"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143097183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-based Moving Object Segmentation for underwater videos using semi-supervised learning 基于图的半监督学习水下视频运动目标分割
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2025.104290
Meghna Kapoor , Wieke Prummel , Jhony H. Giraldo , Badri Narayan Subudhi , Anastasia Zakharova , Thierry Bouwmans , Ankur Bansal
{"title":"Graph-based Moving Object Segmentation for underwater videos using semi-supervised learning","authors":"Meghna Kapoor ,&nbsp;Wieke Prummel ,&nbsp;Jhony H. Giraldo ,&nbsp;Badri Narayan Subudhi ,&nbsp;Anastasia Zakharova ,&nbsp;Thierry Bouwmans ,&nbsp;Ankur Bansal","doi":"10.1016/j.cviu.2025.104290","DOIUrl":"10.1016/j.cviu.2025.104290","url":null,"abstract":"<div><div>Moving object segmentation (MOS) using passive underwater image processing is an important technology for monitoring marine habitats. It aids marine biologists studying biological oceanography and the associated fields of chemical, physical, and geological oceanography to understand marine organisms. Dynamic backgrounds due to marine organisms like algae and seaweed, and improper illumination of the environment pose challenges in detecting moving objects in the scene. Previous graph-learning methods have shown promising results in MOS, but are mostly limited to terrestrial surface videos such as traffic video surveillance. Traditional object modeling fails in underwater scenes, due to fish shape and color degradation in motion and the lack of extensive underwater datasets for deep-learning models. Therefore, we propose a semi-supervised graph-learning approach (GraphMOS-U) to segment moving objects in underwater environments. Additionally, existing datasets were consolidated to form the proposed Teleost Fish Classification Dataset, specifically designed for fish classification tasks in complex environments to avoid unseen scenes, ensuring the replication of the transfer learning process on a ResNet-50 backbone. GraphMOS-U uses a six-step approach with transfer learning using Mask R-CNN and a ResNet-50 backbone for instance segmentation, followed by feature extraction using optical flow, visual saliency, and texture. After concatenating these features, a <span><math><mi>k</mi></math></span>-NN Graph is constructed, and graph node classification is applied to label objects as foreground or background. The foreground nodes are used to reconstruct the segmentation map of the moving object from the scene. Quantitative and qualitative experiments demonstrate that GraphMOS-U outperforms state-of-the-art algorithms, accurately detecting moving objects while preserving fine details. The proposed method enables the use of graph-based MOS algorithms in underwater scenes.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"252 ","pages":"Article 104290"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Illumination-aware and structure-guided transformer for low-light image enhancement 用于弱光图像增强的照明感知和结构导向变压器
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104276
Guodong Fan , Zishu Yao , Min Gan
{"title":"Illumination-aware and structure-guided transformer for low-light image enhancement","authors":"Guodong Fan ,&nbsp;Zishu Yao ,&nbsp;Min Gan","doi":"10.1016/j.cviu.2024.104276","DOIUrl":"10.1016/j.cviu.2024.104276","url":null,"abstract":"<div><div>In this paper, we proposed a novel illumination-aware and structure-guided transformer that achieves efficient image enhancement by focusing on brightness degradation and precise high-frequency guidance. Specifically, low-light images often contain numerous regions with similar brightness levels but different spatial locations. However, existing attention mechanisms only compute self-attention using channel dimensions or fixed-size spatial blocks, which limits their ability to capture long-range features, making it challenging to achieve satisfactory image restoration quality. At the same time, the details of low-light images are mostly hidden in the darkness. However, existing models often give equal attention to both high-frequency and smooth regions, which makes it difficult to capture the details of deep degradation, resulting in blurry recovered image details. On the one hand, we introduced a dynamic brightness multi-domain self-attention mechanism that selectively focuses on spatial features within dynamic ranges and incorporates frequency domain information. This approach allows the model to capture both local details and global features, restoring global brightness while paying closer attention to regions with similar degradation. On the other hand, we proposed a global maximum gradient search strategy to guide the model’s attention towards high-frequency detail regions, thereby achieving a more fine-grained restored image. Extensive experiments on various benchmark datasets demonstrate that our method achieves state-of-the-art performance.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"252 ","pages":"Article 104276"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143101395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
YES: You should Examine Suspect cues for low-light object detection 是的:你应该检查低光物体检测的可疑线索
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104271
Shu Ye , Wenxin Huang , Wenxuan Liu , Liang Chen , Xiao Wang , Xian Zhong
{"title":"YES: You should Examine Suspect cues for low-light object detection","authors":"Shu Ye ,&nbsp;Wenxin Huang ,&nbsp;Wenxuan Liu ,&nbsp;Liang Chen ,&nbsp;Xiao Wang ,&nbsp;Xian Zhong","doi":"10.1016/j.cviu.2024.104271","DOIUrl":"10.1016/j.cviu.2024.104271","url":null,"abstract":"<div><div>Object detection in low-light conditions presents substantial challenges, particularly the issue we define as “low-light object-background cheating”. This phenomenon arises from uneven lighting, leading to blurred and inaccurate object edges. Most existing methods focus on basic feature enhancement and addressing the gap between normal-light and synthetic low-light conditions. However, they often overlook the complexities introduced by uneven lighting in real-world environments. To address this, we propose a novel low-light object detection framework, You Examine Suspect (YES), comprising two key components: the Optical Balance Enhancer (OBE) and the Entanglement Attenuation Module (EAM). The OBE emphasizes “balance” by employing techniques such as inverse tone mapping, white balance, and gamma correction to recover details in dark regions while adjusting brightness and contrast without introducing noise. The EAM focuses on “disentanglement” by analyzing both object regions and surrounding areas affected by lighting variations and integrating multi-scale contextual information to clarify ambiguous features. Extensive experiments on <span>ExDark</span> and <span>Dark Face</span> datasets demonstrate the superior performance of proposed YES, validating its effectiveness in low-light object detection tasks. The code will be available at <span><span>https://github.com/Regina971/YES</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104271"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-domain conditional prior network for water-related optical image enhancement 水相关光学图像增强的多域条件先验网络
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104251
Tianyu Wei , Dehuan Zhang , Zongxin He , Rui Zhou , Xiangfu Meng
{"title":"Multi-domain conditional prior network for water-related optical image enhancement","authors":"Tianyu Wei ,&nbsp;Dehuan Zhang ,&nbsp;Zongxin He ,&nbsp;Rui Zhou ,&nbsp;Xiangfu Meng","doi":"10.1016/j.cviu.2024.104251","DOIUrl":"10.1016/j.cviu.2024.104251","url":null,"abstract":"<div><div>Water-related optical image enhancement improves the perception of information for human and machine vision, facilitating the development and utilization of marine resources. Due to the absorption and scattering of light in different water media, water-related optical images typically suffer from color distortion and low contrast. However, existing enhancement methods struggle to accurately simulate the imaging process in real underwater environments. To model and invert the degradation process of water-related optical images, we propose a Multi-domain Conditional Prior Network (MCPN) based on color vector prior and spectrum vector prior for enhancing water-related optical images. MCPN captures color, luminance, and structural priors across different feature spaces, resulting in a lightweight architecture that enhances water-related optical images while preserving critical information fidelity. Specifically, MCPN includes a modulated network, and a conditional network comprises two conditional units. The modulated network is a lightweight Convolutional Neural Network responsible for image reconstruction and local feature refinement. To avoid feature loss from multiple extractions, the Gaussian Conditional Unit (GCU) extracts atmospheric light and color shift information from the input image to form color prior vectors. Simultaneously, incorporating the Fast Fourier Transform, the Spectrum Conditional Unit (SCU) extracts scene brightness and structure to form spectrum prior vectors. These prior vectors are embedded into the modulated network to guide the image reconstruction. MCPN utilizes a PAL-based weighted Selective Supervision (PSS) strategy, selectively adjusting learning weights for images with excessive artificial noise. Experimental results demonstrate that MCPN outperforms existing methods, achieving excellent performance on the UIEB dataset. The PSS also shows fine feature matching in downstream applications.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104251"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UAV-based person re-identification: A survey of UAV datasets, approaches, and challenges 基于无人机的人员再识别:无人机数据集、方法和挑战的调查
IF 4.3 3区 计算机科学
Computer Vision and Image Understanding Pub Date : 2025-02-01 DOI: 10.1016/j.cviu.2024.104261
Yousaf Albaluchi , Biying Fu , Naser Damer , Raghavendra Ramachandra , Kiran Raja
{"title":"UAV-based person re-identification: A survey of UAV datasets, approaches, and challenges","authors":"Yousaf Albaluchi ,&nbsp;Biying Fu ,&nbsp;Naser Damer ,&nbsp;Raghavendra Ramachandra ,&nbsp;Kiran Raja","doi":"10.1016/j.cviu.2024.104261","DOIUrl":"10.1016/j.cviu.2024.104261","url":null,"abstract":"<div><div>Person re-identification (ReID) has gained significant interest due to growing public safety concerns that require advanced surveillance and identification mechanisms. While most existing ReID research relies on static surveillance cameras, the use of Unmanned Aerial Vehicles (UAVs) for surveillance has recently gained popularity. Noting the promising application of UAVs in ReID, this paper presents a comprehensive overview of UAV-based ReID, highlighting publicly available datasets, key challenges, and methodologies. We summarize and consolidate evaluations conducted across multiple studies, providing a unified perspective on the state of UAV-based ReID research. Despite their limited size and diversity, We underscore current datasets’ importance in advancing UAV-based ReID research. The survey also presents a list of all available approaches for UAV-based ReID. The survey presents challenges associated with UAV-based ReID, including environmental conditions, image quality issues, and privacy concerns. We discuss dynamic adaptation techniques, multi-model fusion, and lightweight algorithms to leverage ground-based person ReID datasets for UAV applications. Finally, we explore potential research directions, highlighting the need for diverse datasets, lightweight algorithms, and innovative approaches to tackle the unique challenges of UAV-based person ReID.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104261"},"PeriodicalIF":4.3,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143149924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信