WGS-YOLO：基于 YOLO 框架的自动驾驶实时物体检测器

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2024-10-03 DOI:10.1016/j.cviu.2024.104200

Shiqin Yue , Ziyi Zhang , Ying Shi , Yonghua Cai

{"title":"WGS-YOLO：基于 YOLO 框架的自动驾驶实时物体检测器","authors":"Shiqin Yue , Ziyi Zhang , Ying Shi , Yonghua Cai","doi":"10.1016/j.cviu.2024.104200","DOIUrl":null,"url":null,"abstract":"<div><div>The safety and reliability of autonomous driving depends on the precision and efficiency of object detection systems. In this paper, a refined adaptation of the YOLO architecture (WGS-YOLO) is developed to improve the detection of pedestrians and vehicles. Specifically, its information fusion is enhanced by incorporating the Weighted Efficient Layer Aggregation Network (W-ELAN) module, an innovative dynamic weighted feature fusion module using channel shuffling. Meanwhile, the computational demands and parameters of the proposed WGS-YOLO are significantly reduced by employing the Space-to-Depth Convolution (SPD-Conv) and the Grouped Spatial Pyramid Pooling (GSPP) modules that have been strategically designed. The performance of our model is evaluated with the BDD100k and DAIR-V2X-V datasets. In terms of mean Average Precision (<span><math><msub><mrow><mtext>mAP</mtext></mrow><mrow><mn>0</mn><mo>.</mo><mn>5</mn></mrow></msub></math></span>), the proposed model outperforms the baseline Yolov7 by 12%. Furthermore, extensive experiments are conducted to verify our analysis and the model’s robustness across diverse scenarios.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104200"},"PeriodicalIF":4.3000,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"WGS-YOLO: A real-time object detector based on YOLO framework for autonomous driving\",\"authors\":\"Shiqin Yue , Ziyi Zhang , Ying Shi , Yonghua Cai\",\"doi\":\"10.1016/j.cviu.2024.104200\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The safety and reliability of autonomous driving depends on the precision and efficiency of object detection systems. In this paper, a refined adaptation of the YOLO architecture (WGS-YOLO) is developed to improve the detection of pedestrians and vehicles. Specifically, its information fusion is enhanced by incorporating the Weighted Efficient Layer Aggregation Network (W-ELAN) module, an innovative dynamic weighted feature fusion module using channel shuffling. Meanwhile, the computational demands and parameters of the proposed WGS-YOLO are significantly reduced by employing the Space-to-Depth Convolution (SPD-Conv) and the Grouped Spatial Pyramid Pooling (GSPP) modules that have been strategically designed. The performance of our model is evaluated with the BDD100k and DAIR-V2X-V datasets. In terms of mean Average Precision (<span><math><msub><mrow><mtext>mAP</mtext></mrow><mrow><mn>0</mn><mo>.</mo><mn>5</mn></mrow></msub></math></span>), the proposed model outperforms the baseline Yolov7 by 12%. Furthermore, extensive experiments are conducted to verify our analysis and the model’s robustness across diverse scenarios.</div></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"249 \",\"pages\":\"Article 104200\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314224002819\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002819","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

自动驾驶的安全性和可靠性取决于物体检测系统的精度和效率。本文对 YOLO 架构（WGS-YOLO）进行了改进，以提高行人和车辆的检测能力。具体来说，通过加入加权高效层聚合网络（Weighted Efficient Layer Aggregation Network，W-ELAN）模块（一种使用信道洗牌的创新动态加权特征融合模块），增强了其信息融合能力。同时，通过采用战略性设计的空深卷积（SPD-Conv）和分组空间金字塔池化（GSPP）模块，大大降低了拟议 WGS-YOLO 的计算需求和参数。我们使用 BDD100k 和 DAIR-V2X-V 数据集评估了模型的性能。就平均精度（mAP0.5）而言，所提出的模型比基准 Yolov7 高出 12%。此外，我们还进行了大量实验，以验证我们的分析和模型在不同场景下的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

WGS-YOLO: A real-time object detector based on YOLO framework for autonomous driving

The safety and reliability of autonomous driving depends on the precision and efficiency of object detection systems. In this paper, a refined adaptation of the YOLO architecture (WGS-YOLO) is developed to improve the detection of pedestrians and vehicles. Specifically, its information fusion is enhanced by incorporating the Weighted Efficient Layer Aggregation Network (W-ELAN) module, an innovative dynamic weighted feature fusion module using channel shuffling. Meanwhile, the computational demands and parameters of the proposed WGS-YOLO are significantly reduced by employing the Space-to-Depth Convolution (SPD-Conv) and the Grouped Spatial Pyramid Pooling (GSPP) modules that have been strategically designed. The performance of our model is evaluated with the BDD100k and DAIR-V2X-V datasets. In terms of mean Average Precision (

{mAP}_{0.5}

), the proposed model outperforms the baseline Yolov7 by 12%. Furthermore, extensive experiments are conducted to verify our analysis and the model’s robustness across diverse scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems