IET Computer Vision最新文献

筛选
英文 中文
The Generated-bbox Guided Interactive Image Segmentation With Vision Transformers 基于视觉变形器的生成盒引导交互式图像分割
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-04-24 DOI: 10.1049/cvi2.70019
Shiyin Zhang, Yafei Dong, Shuang Qiu
{"title":"The Generated-bbox Guided Interactive Image Segmentation With Vision Transformers","authors":"Shiyin Zhang,&nbsp;Yafei Dong,&nbsp;Shuang Qiu","doi":"10.1049/cvi2.70019","DOIUrl":"https://doi.org/10.1049/cvi2.70019","url":null,"abstract":"<p>Existing click-based interactive image segmentation methods typically initiate object extraction with the first click and iteratively refine the coarse segmentation through subsequent interactions. Unlike box-based methods, click-based approaches mitigate ambiguity when multiple targets are present within a single bounding box, but suffer from a lack of precise location and outline information. Inspired by instance segmentation, the authors propose a Generated-bbox Guided method that provides location and outline information using an automatically generated bounding box, rather than a manually labelled one, minimising the need for extensive user interaction. Building on the success of vision transformers, the authors adopt them as the network architecture to enhance model's performance. A click-based interactive image segmentation network named the Generated-bbox Guided Coarse-to-Fine Network (GCFN) was proposed. GCFN is a two-stage cascade network comprising two sub-networks: Coarsenet and Finenet. A transformer-based Box Detector was introduced to generate an initial bounding box from a inside click, that can provide location and outline information. Additionally, two feature enhancement modules guided by foreground and background information: the Foreground-Background Feature Enhancement Module (FFEM) and the Pixel Enhancement Module (PEM) were designed. The authors evaluate the GCFN method on five popular benchmark datasets and demonstrate the generalisation capability on three medical image datasets.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70019","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-Based Uncertainty Estimation for Source-Free Active Domain Adaptation 无源主动域自适应中基于结构的不确定性估计
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-04-16 DOI: 10.1049/cvi2.70020
Jihong Ouyang, Zhengjie Zhang, Qingyi Meng, Jinjin Chi
{"title":"Structure-Based Uncertainty Estimation for Source-Free Active Domain Adaptation","authors":"Jihong Ouyang,&nbsp;Zhengjie Zhang,&nbsp;Qingyi Meng,&nbsp;Jinjin Chi","doi":"10.1049/cvi2.70020","DOIUrl":"https://doi.org/10.1049/cvi2.70020","url":null,"abstract":"<p>Active domain adaptation (active DA) provides an effective solution by selectively labelling a limited number of target samples to significantly enhance adaptation performance. However, existing active DA methods often struggle in real-world scenarios where, due to data privacy concerns, only a pre-trained source model is available, rather than the source samples. To address this issue, we propose a novel method called the structure-based uncertainty estimation model (SUEM) for source-free active domain adaptation (SFADA). To be specific, we introduce an innovative active sample selection strategy that combines both uncertainty and diversity sampling to identify the most informative samples. We assess the uncertainty in target samples using structure-wise probabilities and implement a diversity selection method to minimise redundancy. For the selected samples, we not only apply standard-supervised loss but also conduct interpolation consistency training to further explore the structural information of the target domain. Extensive experiments across four widely used datasets demonstrate that our method is comparable to or outperforms current UDA and active DA methods.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70020","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143840855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synchronised and Fine-Grained Head for Skeleton-Based Ambiguous Action Recognition 基于骨架的模糊动作识别的同步和细粒度头部
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-04-15 DOI: 10.1049/cvi2.70016
Hao Huang, Yujie Lin, Siyu Chen, Haiyang Liu
{"title":"Synchronised and Fine-Grained Head for Skeleton-Based Ambiguous Action Recognition","authors":"Hao Huang,&nbsp;Yujie Lin,&nbsp;Siyu Chen,&nbsp;Haiyang Liu","doi":"10.1049/cvi2.70016","DOIUrl":"https://doi.org/10.1049/cvi2.70016","url":null,"abstract":"<p>Skeleton-based action recognition using Graph Convolutional Networks (GCNs) has achieved remarkable performance, but recognising ambiguous actions, such as ‘waving’ and ‘saluting’, remains a significant challenge. Existing methods typically rely on a serial combination of GCNs and Temporal Convolutional Networks (TCNs), where spatial and temporal features are extracted independently, leading to an unbalanced spatial-temporal information, which hinders accurate action recognition. Moreover, existing methods for ambiguous actions often overemphasise local details, resulting in the loss of crucial global context, which further complicates the task of differentiating ambiguous actions. To address these challenges, the authors propose a lightweight plug-and-play module called Synchronised and Fine-grained Head (SF-Head), inserted between GCN and TCN layers. SF-Head first conducts Synchronised Spatial-Temporal Extraction (SSTE) with a Feature Redundancy Loss (F-RL), ensuring a balanced interaction between the two types of features. It then performs Adaptive Cross-dimensional Feature Aggregation (AC-FA), with a Feature Consistency Loss (F-CL), which aligns the aggregated feature with their original spatial-temporal feature. This aggregation step effectively combines both global context and local details, enhancing the model's ability to classify ambiguous actions. Experimental results on NTU RGB + D 60, NTU RGB + D 120, NW-UCLA and PKU-MMD I datasets demonstrate significant improvements in distinguishing ambiguous actions. Our code will be made available at https://github.com/HaoHuang2003/SFHead.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70016","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143835867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EDG-CDM: A New Encoder-Guided Conditional Diffusion Model-Based Image Synthesis Method for Limited Data EDG-CDM:一种新的基于编码器引导的条件扩散模型的有限数据图像合成方法
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-04-08 DOI: 10.1049/cvi2.70018
Haopeng Lei, Hao Yin, Kaijun Liang, Mingwen Wang, Jinshan Zeng, Guoliang Luo
{"title":"EDG-CDM: A New Encoder-Guided Conditional Diffusion Model-Based Image Synthesis Method for Limited Data","authors":"Haopeng Lei,&nbsp;Hao Yin,&nbsp;Kaijun Liang,&nbsp;Mingwen Wang,&nbsp;Jinshan Zeng,&nbsp;Guoliang Luo","doi":"10.1049/cvi2.70018","DOIUrl":"https://doi.org/10.1049/cvi2.70018","url":null,"abstract":"<p>The Diffusion Probabilistic Model (DM) has emerged as a powerful generative model in the field of image synthesis, capable of producing high-quality and realistic images. However, training DM requires a large and diverse dataset, which can be challenging to obtain. This limitation weakens the model's generalisation and robustness when training data is limited. To address this issue, EDG-CDM, an innovative encoder-guided conditional diffusion model was proposed for image synthesis with limited data. Firstly, the authors pre-train the encoder by introducing noise to capture the distribution of image features and generate the condition vector through contrastive learning and KL divergence. Next, the encoder undergoes further training with classification to integrate image class information, providing more favourable and versatile conditions for the diffusion model. Subsequently, the encoder is connected to the diffusion model, which is trained using all available data with encoder-provided conditions. Finally, the authors evaluate EDG-CDM on various public datasets with limited data, conducting extensive experiments and comparing our results with state-of-the-art methods using metrics such as Fréchet Inception Distance and Inception Score. Our experiments demonstrate that EDG-CDM outperforms existing models by consistently achieving the lowest FID scores and the highest IS scores, highlighting its effectiveness in generating high-quality and diverse images with limited training data. These results underscore the significance of EDG-CDM in advancing image synthesis techniques under data-constrained scenarios.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70018","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143801593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance of Computer Vision Algorithms for Fine-Grained Classification Using Crowdsourced Insect Images 利用众包昆虫图像进行细粒度分类的计算机视觉算法的性能
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-04-04 DOI: 10.1049/cvi2.70006
Rita Pucci, Vincent J. Kalkman, Dan Stowell
{"title":"Performance of Computer Vision Algorithms for Fine-Grained Classification Using Crowdsourced Insect Images","authors":"Rita Pucci,&nbsp;Vincent J. Kalkman,&nbsp;Dan Stowell","doi":"10.1049/cvi2.70006","DOIUrl":"https://doi.org/10.1049/cvi2.70006","url":null,"abstract":"<p>With fine-grained classification, we identify unique characteristics to distinguish among classes of the same super-class. We are focusing on species recognition in Insecta as they are critical for biodiversity monitoring and at the base of many ecosystems. With citizen science campaigns, billions of images are collected in the wild. Once these are labelled, experts can use them to create distribution maps. However, the labelling process is time consuming, which is where computer vision comes in. The field of computer vision offers a wide range of algorithms, each with its strengths and weaknesses; how do we identify the algorithm that is in line with our application? To answer this question, we provide a full and detailed evaluation of nine algorithms among deep convolutional networks (CNN), vision transformers (ViT) and locality-based vision transformers (LBVT) on 4 different aspects: classification performance, embedding quality, computational cost and gradient activity. We offer insights that we have not yet had in this domain proving to which extent these algorithms solve the fine-grained tasks in Insecta. We found that ViT performs the best on inference speed and computational cost, whereas LBVT outperforms the others on performance and embedding quality; the CNN provide a trade-off among the metrics.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70006","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143778248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Foundation Model Based Camouflaged Object Detection 基于基础模型的伪装目标检测
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-04-01 DOI: 10.1049/cvi2.70009
Zefeng Chen, Zhijiang Li, Yunqi Xue, Li Zhang
{"title":"Foundation Model Based Camouflaged Object Detection","authors":"Zefeng Chen,&nbsp;Zhijiang Li,&nbsp;Yunqi Xue,&nbsp;Li Zhang","doi":"10.1049/cvi2.70009","DOIUrl":"https://doi.org/10.1049/cvi2.70009","url":null,"abstract":"<p>Camouflaged object detection (COD) aims to identify and segment objects that closely resemble and are seamlessly integrated into their surrounding environments, making it a challenging task in computer vision. COD is constrained by the limited availability of training data and annotated samples, and most carefully designed COD models exhibit diminished performance under low-data conditions. In recent years, there has been increasing interest in leveraging foundation models, which have demonstrated robust general capabilities and superior generalisation performance, to address COD challenges. This work proposes a knowledge-guided domain adaptation (KGDA) approach to tackle the data scarcity problem in COD. The method utilises the knowledge descriptions generated by multimodal large language models (MLLMs) for camouflaged images, aiming to enhance the model's comprehension of semantic objects and camouflaged scenes through highly abstract and generalised knowledge representations. To resolve ambiguities and errors in the generated text descriptions, a multi-level knowledge aggregation (MLKG) module is devised. This module consolidates consistent semantic knowledge and forms multi-level semantic knowledge features. To incorporate semantic knowledge into the visual foundation model, the authors introduce a knowledge-guided semantic enhancement adaptor (KSEA) that integrates the semantic knowledge of camouflaged objects while preserving the original knowledge of the foundation model. Extensive experiments demonstrate that our method surpasses 19 state-of-the-art approaches and exhibits strong generalisation capabilities even with limited annotated data.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70009","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143749464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal Optimisation of Satellite Image-Based Crop Mapping: A Comparison of Deep Time Series and Semi-Supervised Time Warping Strategies 基于卫星图像作物制图的时间优化:深度时间序列与半监督时间翘曲策略的比较
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-03-26 DOI: 10.1049/cvi2.70014
Rosie Finnegan, Joseph Metcalfe, Sara Sharifzadeh, Fabio Caraffini, Xianghua Xie, Alberto Hornero, Nicholas W. Synes
{"title":"Temporal Optimisation of Satellite Image-Based Crop Mapping: A Comparison of Deep Time Series and Semi-Supervised Time Warping Strategies","authors":"Rosie Finnegan,&nbsp;Joseph Metcalfe,&nbsp;Sara Sharifzadeh,&nbsp;Fabio Caraffini,&nbsp;Xianghua Xie,&nbsp;Alberto Hornero,&nbsp;Nicholas W. Synes","doi":"10.1049/cvi2.70014","DOIUrl":"https://doi.org/10.1049/cvi2.70014","url":null,"abstract":"<p>This study presents a novel approach to crop mapping using remotely sensed satellite images. It addresses the significant classification modelling challenges, including (1) the requirements for extensive labelled data and (2) the complex optimisation problem for selection of appropriate temporal windows in the absence of prior knowledge of cultivation calendars. We compare the lightweight Dynamic Time Warping (DTW) classification method with the heavily supervised Convolutional Neural Network - Long Short-Term Memory (CNN-LSTM) using high-resolution multispectral optical satellite imagery (3 m/pixel). Our approach integrates effective practical preprocessing steps, including data augmentation and a data-driven optimisation strategy for the temporal window, even in the presence of numerous crop classes. Our findings demonstrate that DTW, despite its lower data demands, can match the performance of CNN-LSTM through our effective preprocessing steps while significantly improving runtime. These results demonstrate that both CNN-LSTM and DTW can achieve deployment-level accuracy and underscore the potential of DTW as a viable alternative to more resource-intensive models. The results also prove the effectiveness of temporal windowing for improving runtime and accuracy of a crop classification study, even with no prior knowledge of planting timeframes.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70014","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143707264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crafting Transferable Adversarial Examples Against 3D Object Detection 制作可转移的对抗3D物体检测的例子
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-03-26 DOI: 10.1049/cvi2.70011
Haiyan Long, Hai Chen, Mengyao Xu, Chonghao Zhang, Fulan Qian
{"title":"Crafting Transferable Adversarial Examples Against 3D Object Detection","authors":"Haiyan Long,&nbsp;Hai Chen,&nbsp;Mengyao Xu,&nbsp;Chonghao Zhang,&nbsp;Fulan Qian","doi":"10.1049/cvi2.70011","DOIUrl":"https://doi.org/10.1049/cvi2.70011","url":null,"abstract":"<p>3D object detection is one of the current popular hotspots by perceiving the surrounding environment through LiDAR and camera sensors to recognise the category and location of objects in the scene. Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples. Although some approaches have begun to investigate the robustness of 3D object detection models, they are currently generating adversarial examples in a white-box setting and there is a lack of research into generating transferable adversarial examples in a black-box setting. In this paper, a non-end-to-end attack algorithm was proposed for LiDAR pipelines that crafts transferable adversarial examples against 3D object detection. Specifically, the method generates adversarial examples by restraining features with high contribution to downstream tasks and amplifying features with low contribution to downstream tasks in the feature space. Extensive experiments validate that the method produces more transferable adversarial point clouds, for example, the method generates adversarial point clouds in the nuScenes dataset that are about 10<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>%</mi>\u0000 </mrow>\u0000 <annotation> $%$</annotation>\u0000 </semantics></math> and 7<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>%</mi>\u0000 </mrow>\u0000 <annotation> $%$</annotation>\u0000 </semantics></math> better than the state-of-the-art method on mAP and NDS, respectively.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70011","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143707265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent Advances of Continual Learning in Computer Vision: An Overview 计算机视觉中持续学习的最新进展:概述
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-03-19 DOI: 10.1049/cvi2.70013
Haoxuan Qu, Hossein Rahmani, Li Xu, Bryan Williams, Jun Liu
{"title":"Recent Advances of Continual Learning in Computer Vision: An Overview","authors":"Haoxuan Qu,&nbsp;Hossein Rahmani,&nbsp;Li Xu,&nbsp;Bryan Williams,&nbsp;Jun Liu","doi":"10.1049/cvi2.70013","DOIUrl":"https://doi.org/10.1049/cvi2.70013","url":null,"abstract":"<p>In contrast to batch learning where all training data is available at once, continual learning represents a family of methods that accumulate knowledge and learn continuously with data available in sequential order. Similar to the human learning process with the ability of learning, fusing and accumulating new knowledge acquired at different time steps, continual learning is considered to have high practical significance. Hence, continual learning has been studied in various artificial intelligence tasks. In this paper, we present a comprehensive review of the recent progress of continual learning in computer vision. In particular, the works are grouped by their representative techniques, including regularisation, knowledge distillation, memory, generative replay, parameter isolation and a combination of the above techniques. For each category of these techniques, both its characteristics and applications in computer vision are presented. At the end of this overview, several subareas, where continuous knowledge accumulation is potentially helpful while continual learning has not been well studied, are discussed.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70013","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143689168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Review of Multi-Object Tracking in Recent Times 近年来多目标跟踪研究综述
IF 1.5 4区 计算机科学
IET Computer Vision Pub Date : 2025-03-09 DOI: 10.1049/cvi2.70010
Suya Li, Hengyi Ren, Xin Xie, Ying Cao
{"title":"A Review of Multi-Object Tracking in Recent Times","authors":"Suya Li,&nbsp;Hengyi Ren,&nbsp;Xin Xie,&nbsp;Ying Cao","doi":"10.1049/cvi2.70010","DOIUrl":"https://doi.org/10.1049/cvi2.70010","url":null,"abstract":"<p>Multi-object tracking (MOT) is a fundamental problem in computer vision that involves tracing the trajectories of foreground targets throughout a video sequence while establishing correspondences for identical objects across frames. With the advancement of deep learning techniques, methods based on deep learning have significantly improved accuracy and efficiency in MOT. This paper reviews several recent deep learning-based MOT methods and categorises them into three main groups: detection-based, single-object tracking (SOT)-based, and segmentation-based methods, according to their core technologies. Additionally, this paper discusses the metrics and datasets used for evaluating MOT performance, the challenges faced in the field, and future directions for research.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.5,"publicationDate":"2025-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70010","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143581368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信