{"title":"LVD-YOLO: An efficient lightweight vehicle detection model for intelligent transportation systems","authors":"Hao Pan, Shaopeng Guan, Xiaoyan Zhao","doi":"10.1016/j.imavis.2024.105276","DOIUrl":null,"url":null,"abstract":"<div><p>Vehicle detection is a fundamental component of intelligent transportation systems. However, current algorithms often encounter issues such as high computational complexity, long execution times, and significant resource demands, making them unsuitable for resource-limited environments. To overcome these challenges, we propose LVD-YOLO, a Lightweight Vehicle Detection Model based on YOLO. This model incorporates the EfficientNetv2 network structure as its backbone, which reduces parameters and enhances feature extraction capabilities. By utilizing a bidirectional feature pyramid structure along with a dual attention mechanism, we enable efficient information exchange across feature layers, thereby improving multiscale feature fusion. Additionally, we refine the model's loss function with SIoU loss to boost regression and prediction performance. Experimental results on the PASCAL VOC and MS COCO datasets show that LVD-YOLO outperforms YOLOv5s, achieving a 0.5% increase in accuracy while reducing FLOPs by 64.6% and parameters by 48.6%. These improvements highlight its effectiveness for use in resource-constrained environments.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105276"},"PeriodicalIF":4.2000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624003810","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Vehicle detection is a fundamental component of intelligent transportation systems. However, current algorithms often encounter issues such as high computational complexity, long execution times, and significant resource demands, making them unsuitable for resource-limited environments. To overcome these challenges, we propose LVD-YOLO, a Lightweight Vehicle Detection Model based on YOLO. This model incorporates the EfficientNetv2 network structure as its backbone, which reduces parameters and enhances feature extraction capabilities. By utilizing a bidirectional feature pyramid structure along with a dual attention mechanism, we enable efficient information exchange across feature layers, thereby improving multiscale feature fusion. Additionally, we refine the model's loss function with SIoU loss to boost regression and prediction performance. Experimental results on the PASCAL VOC and MS COCO datasets show that LVD-YOLO outperforms YOLOv5s, achieving a 0.5% increase in accuracy while reducing FLOPs by 64.6% and parameters by 48.6%. These improvements highlight its effectiveness for use in resource-constrained environments.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.