YOLO-HyperVision: A vision transformer backbone-based enhancement of YOLOv5 for detection of dynamic traffic information

IF 5 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Shizhou Xu, Mengjie Zhang , Jingyu Chen, Yiming Zhong
{"title":"YOLO-HyperVision: A vision transformer backbone-based enhancement of YOLOv5 for detection of dynamic traffic information","authors":"Shizhou Xu,&nbsp;Mengjie Zhang ,&nbsp;Jingyu Chen,&nbsp;Yiming Zhong","doi":"10.1016/j.eij.2024.100523","DOIUrl":null,"url":null,"abstract":"<div><p>With the increase of traffic flow in modern urban areas, traffic congestion has become a serious problem that affects people’s normal production and life. Using target detection technology instead of manual labor can quickly detect the road traffic situation and provide timely information about the traffic flow. However, when using drones to observe the traffic flow in the air, the perspective effect will cause the detected vehicles and pedestrians to be very small, and the scale difference between different categories of targets is large, which increases the detection difficulty of a single convolutional neural network model. In order to solve the problem of low accuracy of traditional single-stage target detection models, this study proposes an improved Yolov5 vehicle target detection model with Vision Transformer (VIT) backbone, You Only Look Once-HyperVision (YOLO-HV), which aims to solve the problem of poor multi-scale target recognition performance caused by the inability of traditional CNN networks to integrate contextual information, and help drones achieve more efficient and accurate traffic flow recognition functions. This study deeply integrates the Vision Transformer (VIT) backbone and Convolutional Neural Network (CNN), effectively combining the multi-scale detection advantages of Vision Transformer and the inductive bias ability of Convolutional Neural Network, and adds multi-scale residual modules and context correlation enhancement modules, which greatly improves the recognition accuracy of single-stage detectors for drone images. Through comparative experiments on the VisDrone dataset, it is found that the detection performance of this model is improved compared with several commonly used detection models. YOLO-HV can increase the mean average precision (mAP) by 3.3% compared with the pure convolutional network of the same model size. YOLO-HV model has achieved excellent performance in the task of traffic flow image detection taken by drones, and can more accurately identify and classify road vehicles than various target detection models.</p></div>","PeriodicalId":56010,"journal":{"name":"Egyptian Informatics Journal","volume":null,"pages":null},"PeriodicalIF":5.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1110866524000860/pdfft?md5=50b127ef84d8fdf25c77f2d161914ee0&pid=1-s2.0-S1110866524000860-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Egyptian Informatics Journal","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1110866524000860","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

With the increase of traffic flow in modern urban areas, traffic congestion has become a serious problem that affects people’s normal production and life. Using target detection technology instead of manual labor can quickly detect the road traffic situation and provide timely information about the traffic flow. However, when using drones to observe the traffic flow in the air, the perspective effect will cause the detected vehicles and pedestrians to be very small, and the scale difference between different categories of targets is large, which increases the detection difficulty of a single convolutional neural network model. In order to solve the problem of low accuracy of traditional single-stage target detection models, this study proposes an improved Yolov5 vehicle target detection model with Vision Transformer (VIT) backbone, You Only Look Once-HyperVision (YOLO-HV), which aims to solve the problem of poor multi-scale target recognition performance caused by the inability of traditional CNN networks to integrate contextual information, and help drones achieve more efficient and accurate traffic flow recognition functions. This study deeply integrates the Vision Transformer (VIT) backbone and Convolutional Neural Network (CNN), effectively combining the multi-scale detection advantages of Vision Transformer and the inductive bias ability of Convolutional Neural Network, and adds multi-scale residual modules and context correlation enhancement modules, which greatly improves the recognition accuracy of single-stage detectors for drone images. Through comparative experiments on the VisDrone dataset, it is found that the detection performance of this model is improved compared with several commonly used detection models. YOLO-HV can increase the mean average precision (mAP) by 3.3% compared with the pure convolutional network of the same model size. YOLO-HV model has achieved excellent performance in the task of traffic flow image detection taken by drones, and can more accurately identify and classify road vehicles than various target detection models.

YOLO-HyperVision:基于视觉转换器骨干网的 YOLOv5 增强版,用于检测动态交通信息
随着现代城市交通流量的增加,交通拥堵已成为影响人们正常生产生活的严重问题。利用目标检测技术代替人工,可以快速检测道路交通状况,及时提供交通流量信息。然而,利用无人机在空中观测交通流量时,透视效应会导致检测到的车辆和行人非常小,不同类别目标之间的尺度差异较大,增加了单一卷积神经网络模型的检测难度。为了解决传统单级目标检测模型准确率低的问题,本研究提出了以视觉转换器(VIT)为骨干的改进型 Yolov5 车辆目标检测模型--YOU ONLY LOOK ONE-HYPERVISION (YOLO-HV),旨在解决传统 CNN 网络无法整合上下文信息导致的多尺度目标识别性能差的问题,帮助无人机实现更高效、更准确的交通流识别功能。本研究深度集成了视觉变换器(VIT)骨干网和卷积神经网络(CNN),有效结合了视觉变换器的多尺度检测优势和卷积神经网络的归纳偏差能力,并增加了多尺度残差模块和上下文关联增强模块,大大提高了无人机图像单级检测器的识别精度。通过在 VisDrone 数据集上的对比实验发现,与几种常用的检测模型相比,该模型的检测性能有所提高。与相同模型大小的纯卷积网络相比,YOLO-HV 的平均精度(mAP)提高了 3.3%。YOLO-HV 模型在无人机拍摄的交通流图像检测任务中表现出色,与各种目标检测模型相比,能更准确地识别和分类道路车辆。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Egyptian Informatics Journal
Egyptian Informatics Journal Decision Sciences-Management Science and Operations Research
CiteScore
11.10
自引率
1.90%
发文量
59
审稿时长
110 days
期刊介绍: The Egyptian Informatics Journal is published by the Faculty of Computers and Artificial Intelligence, Cairo University. This Journal provides a forum for the state-of-the-art research and development in the fields of computing, including computer sciences, information technologies, information systems, operations research and decision support. Innovative and not-previously-published work in subjects covered by the Journal is encouraged to be submitted, whether from academic, research or commercial sources.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信