MP-YOLO: multidimensional feature fusion based layer adaptive pruning YOLO for dense vehicle object detection algorithm

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2025-08-18 DOI:10.1016/j.jvcir.2025.104560

Wanzhen Zhou , Junjie Wang , Xi Meng , Jianxia Wang , Yufei Song , Zhiguo Liu

{"title":"MP-YOLO: multidimensional feature fusion based layer adaptive pruning YOLO for dense vehicle object detection algorithm","authors":"Wanzhen Zhou , Junjie Wang , Xi Meng , Jianxia Wang , Yufei Song , Zhiguo Liu","doi":"10.1016/j.jvcir.2025.104560","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, artificial intelligence technology has been applied in the research and development of autonomous vehicles. However, the high energy consumption of artificial intelligence models and the high precision requirements of object detection in autonomous driving have led to a stagnation in the development of autonomous vehicles. To alleviate the above problems, we optimize YOLOv8 and propose a lightweight vehicle object detection algorithm, MP-YOLO (Multidimensional feature fusion and layer adaptive pruning YOLO), to adapt to edge devices with limited storage while meeting the requirements for detection accuracy. Firstly, two multi-scale feature fusion modules, MSFB and HFF, are proposed to merge features of different dimensions, enhancing the model’s feature learning capability. Secondly, a detection head at a scale of 160*160 is added to improve small object detection capability. Thirdly, the WIoU loss function replaces the original CIOU loss function in YOLOv8 to address the issue of high overlap among road objects. Lastly, using the Layer Adaptive Sparsity for Magnitude-based Pruning (LAMP) method to significantly reduce model size. The MP-YOLO model was tested on the latest automatic driving dataset DAIR-V2X, and the results showed that the performance of the proposed MP-YOLO exceeded the original model, with improvements of 4.7 % in AP<sub>50</sub> and 4.2 % in AP, and the model size changed from the initial 6 MB to 2.2 MB. It is superior to other classical detection models in terms of volume and accuracy, and meets the requirements of deployment on edge devices. The source code is available at <span><span>https://github.com/Wang-jj-zs/MP-YOLO/tree/master</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"112 ","pages":"Article 104560"},"PeriodicalIF":3.1000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325001749","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, artificial intelligence technology has been applied in the research and development of autonomous vehicles. However, the high energy consumption of artificial intelligence models and the high precision requirements of object detection in autonomous driving have led to a stagnation in the development of autonomous vehicles. To alleviate the above problems, we optimize YOLOv8 and propose a lightweight vehicle object detection algorithm, MP-YOLO (Multidimensional feature fusion and layer adaptive pruning YOLO), to adapt to edge devices with limited storage while meeting the requirements for detection accuracy. Firstly, two multi-scale feature fusion modules, MSFB and HFF, are proposed to merge features of different dimensions, enhancing the model’s feature learning capability. Secondly, a detection head at a scale of 160*160 is added to improve small object detection capability. Thirdly, the WIoU loss function replaces the original CIOU loss function in YOLOv8 to address the issue of high overlap among road objects. Lastly, using the Layer Adaptive Sparsity for Magnitude-based Pruning (LAMP) method to significantly reduce model size. The MP-YOLO model was tested on the latest automatic driving dataset DAIR-V2X, and the results showed that the performance of the proposed MP-YOLO exceeded the original model, with improvements of 4.7 % in AP₅₀ and 4.2 % in AP, and the model size changed from the initial 6 MB to 2.2 MB. It is superior to other classical detection models in terms of volume and accuracy, and meets the requirements of deployment on edge devices. The source code is available at https://github.com/Wang-jj-zs/MP-YOLO/tree/master.

查看原文本刊更多论文

MP-YOLO：基于多维特征融合的层自适应剪枝YOLO密集车辆目标检测算法

近年来，人工智能技术在自动驾驶汽车的研发中得到了应用。然而，人工智能模型的高能耗和自动驾驶中对物体检测的高精度要求，导致了自动驾驶汽车发展的停滞。针对上述问题，我们对YOLOv8进行了优化，提出了一种轻量级的车辆目标检测算法MP-YOLO (Multidimensional feature fusion and layer adaptive pruning YOLO)，以适应存储有限的边缘设备，同时满足检测精度要求。首先，提出了两个多尺度特征融合模块MSFB和HFF，对不同维度的特征进行融合，增强模型的特征学习能力；其次，增加160*160的检测头，提高小目标检测能力。第三，用WIoU损失函数取代了YOLOv8中原有的CIOU损失函数，解决了道路对象之间高度重叠的问题。最后，利用层自适应稀疏度进行基于幅度的剪枝（LAMP）方法，显著减小模型尺寸。在最新的自动驾驶数据集DAIR-V2X上对MP-YOLO模型进行了测试，结果表明，所提出的MP-YOLO模型的性能优于原始模型，AP50和AP分别提高了4.7%和4.2%，模型大小从最初的6 MB提高到2.2 MB，在体积和精度方面均优于其他经典检测模型，满足边缘设备部署的要求。源代码可从https://github.com/Wang-jj-zs/MP-YOLO/tree/master获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.