Object Detection Based on CNN and Vision-Transformer: A Survey

IF 1.3 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Computer Vision Pub Date : 2025-05-31 DOI:10.1049/cvi2.70028

Jinfeng Cao, Bo Peng, Mingzhong Gao, Haichun Hao, Xinfang Li, Hongwei Mou

{"title":"Object Detection Based on CNN and Vision-Transformer: A Survey","authors":"Jinfeng Cao, Bo Peng, Mingzhong Gao, Haichun Hao, Xinfang Li, Hongwei Mou","doi":"10.1049/cvi2.70028","DOIUrl":null,"url":null,"abstract":"<p>Object detection is the most crucial and challenging task of computer vision and has been used in various fields in recent years, such as autonomous driving and industrial inspection. Traditional object detection methods are mainly based on the sliding windows and the handcrafted features, which have problems such as insufficient understanding of image features and low accuracy of detection. With the rapid advancements in deep learning, convolutional neural networks (CNNs) and vision transformers have become fundamental components in object detection models. These components are capable of learning more advanced and deeper image properties, leading to a transformational breakthrough in the performance of object detection. In this review, we comprehensively review the representative object detection models from deep learning periods, tracing their architectural shifts and technological breakthroughs. Furthermore, we discuss key challenges and promising research directions in the object detection. This review aims to provide a comprehensive foundation for practitioners to enhance their understanding of object detection technologies.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"19 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.70028","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/cvi2.70028","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Object detection is the most crucial and challenging task of computer vision and has been used in various fields in recent years, such as autonomous driving and industrial inspection. Traditional object detection methods are mainly based on the sliding windows and the handcrafted features, which have problems such as insufficient understanding of image features and low accuracy of detection. With the rapid advancements in deep learning, convolutional neural networks (CNNs) and vision transformers have become fundamental components in object detection models. These components are capable of learning more advanced and deeper image properties, leading to a transformational breakthrough in the performance of object detection. In this review, we comprehensively review the representative object detection models from deep learning periods, tracing their architectural shifts and technological breakthroughs. Furthermore, we discuss key challenges and promising research directions in the object detection. This review aims to provide a comprehensive foundation for practitioners to enhance their understanding of object detection technologies.

Abstract Image

查看原文本刊更多论文

基于CNN和视觉变换的目标检测研究进展

目标检测是计算机视觉中最关键和最具挑战性的任务，近年来在自动驾驶和工业检测等各个领域得到了应用。传统的目标检测方法主要基于滑动窗口和手工特征，存在对图像特征理解不足、检测精度低等问题。随着深度学习的快速发展，卷积神经网络（cnn）和视觉变压器已经成为目标检测模型的基本组成部分。这些组件能够学习更高级和更深层次的图像属性，从而在目标检测性能方面实现转型突破。在这篇综述中，我们全面回顾了深度学习时期具有代表性的目标检测模型，追踪了它们的架构转变和技术突破。此外，我们还讨论了目标检测的关键挑战和有前景的研究方向。本综述旨在为从业者提供一个全面的基础，以提高他们对目标检测技术的理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Computer Vision 工程技术-工程：电子与电气

CiteScore

3.30

自引率

11.80%

发文量

审稿时长

3.4 months

期刊介绍： IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in computer vision. IET Computer Vision welcomes submissions on the following topics: Biologically and perceptually motivated approaches to low level vision (feature detection, etc.); Perceptual grouping and organisation Representation, analysis and matching of 2D and 3D shape Shape-from-X Object recognition Image understanding Learning with visual inputs Motion analysis and object tracking Multiview scene analysis Cognitive approaches in low, mid and high level vision Control in visual systems Colour, reflectance and light Statistical and probabilistic models Face and gesture Surveillance Biometrics and security Robotics Vehicle guidance Automatic model aquisition Medical image analysis and understanding Aerial scene analysis and remote sensing Deep learning models in computer vision Both methodological and applications orientated papers are welcome. Manuscripts submitted are expected to include a detailed and analytical review of the literature and state-of-the-art exposition of the original proposed research and its methodology, its thorough experimental evaluation, and last but not least, comparative evaluation against relevant and state-of-the-art methods. Submissions not abiding by these minimum requirements may be returned to authors without being sent to review. Special Issues Current Call for Papers: Computer Vision for Smart Cameras and Camera Networks - https://digital-library.theiet.org/files/IET_CVI_SC.pdf Computer Vision for the Creative Industries - https://digital-library.theiet.org/files/IET_CVI_CVCI.pdf