Object detection using convolutional neural networks and transformer-based models: a review

Shrishti Shah, Jitendra Tembhurne
{"title":"Object detection using convolutional neural networks and transformer-based models: a review","authors":"Shrishti Shah, Jitendra Tembhurne","doi":"10.1186/s43067-023-00123-z","DOIUrl":null,"url":null,"abstract":"Transformer models are evolving rapidly in standard natural language processing tasks; however, their application is drastically proliferating in computer vision (CV) as well. Transformers are either replacing convolution networks or being used in conjunction with them. This paper aims to differentiate the design of convolutional neural networks (CNNs) built models and models based on transformer, particularly in the domain of object detection. CNNs are designed to capture local spatial patterns through convolutional layers, which is well suited for tasks that involve understanding visual hierarchies and features. However, transformers bring a new paradigm to CV by leveraging self-attention mechanisms, which allows to capture both local and global context in images. Here, we target the various aspects such as basic level of understanding, comparative study, application of attention model, and highlighting tremendous growth along with delivering efficiency are presented effectively for object detection task. The main emphasis of this work is to offer basic understanding of architectures for object detection task and motivates to adopt the same in computer vision tasks. In addition, this paper highlights the evolution of transformer-based models in object detection and their growing importance in the field of computer vision, we also identified the open research direction in the same field.","PeriodicalId":100777,"journal":{"name":"Journal of Electrical Systems and Information Technology","volume":"86 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electrical Systems and Information Technology","FirstCategoryId":"0","ListUrlMain":"https://doi.org/10.1186/s43067-023-00123-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Transformer models are evolving rapidly in standard natural language processing tasks; however, their application is drastically proliferating in computer vision (CV) as well. Transformers are either replacing convolution networks or being used in conjunction with them. This paper aims to differentiate the design of convolutional neural networks (CNNs) built models and models based on transformer, particularly in the domain of object detection. CNNs are designed to capture local spatial patterns through convolutional layers, which is well suited for tasks that involve understanding visual hierarchies and features. However, transformers bring a new paradigm to CV by leveraging self-attention mechanisms, which allows to capture both local and global context in images. Here, we target the various aspects such as basic level of understanding, comparative study, application of attention model, and highlighting tremendous growth along with delivering efficiency are presented effectively for object detection task. The main emphasis of this work is to offer basic understanding of architectures for object detection task and motivates to adopt the same in computer vision tasks. In addition, this paper highlights the evolution of transformer-based models in object detection and their growing importance in the field of computer vision, we also identified the open research direction in the same field.
使用卷积神经网络和基于变压器的模型进行目标检测:综述
Transformer模型在标准的自然语言处理任务中发展迅速;然而,它们在计算机视觉(CV)中的应用也在急剧增加。变压器要么取代卷积网络,要么与卷积网络结合使用。本文旨在区分卷积神经网络(cnn)建立模型和基于变压器的模型的设计,特别是在目标检测领域。cnn被设计为通过卷积层捕获局部空间模式,这非常适合涉及理解视觉层次和特征的任务。然而,变形金刚通过利用自我关注机制为CV带来了一个新的范例,它允许在图像中捕捉局部和全局上下文。在此,我们针对目标检测任务的基本理解水平、比较研究、注意模型的应用等各个方面进行了有效的介绍,并强调了目标检测任务的巨大增长和交付效率。这项工作的主要重点是提供对目标检测任务架构的基本理解,并激励在计算机视觉任务中采用相同的架构。此外,本文还强调了基于变压器的目标检测模型的发展及其在计算机视觉领域的重要性,并指出了该领域的开放研究方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信