MFF-YOLO：基于多尺度语义特征融合的改进YOLO算法

IF 3.5 1区计算机科学 Q1 Multidisciplinary

Tsinghua Science and Technology Pub Date : 2025-04-29 DOI:10.26599/TST.2024.9010097

Junsan Zhang;Chenyang Xu;Shigen Shen;Jie Zhu;Peiying Zhang

{"title":"MFF-YOLO：基于多尺度语义特征融合的改进YOLO算法","authors":"Junsan Zhang;Chenyang Xu;Shigen Shen;Jie Zhu;Peiying Zhang","doi":"10.26599/TST.2024.9010097","DOIUrl":null,"url":null,"abstract":"The YOLOv5 algorithm is widely used in edge computing systems for object detection. However, the limited computing resources of embedded devices and the large model size of existing deep learning based methods increase the difficulty of real-time object detection on edge devices. To address this issue, we propose a smaller, less computationally intensive, and more accurate algorithm for object detection. Multi-scale Feature Fusion-YOLO (MFF-YOLO) is built on top of the YOLOv5s framework, but it contains substantial improvements to YOLOv5s. First, we design the MFF module to improve the feature propagation path in the feature pyramid, which further integrates the semantic information from different paths of feature layers. Then, a large convolution-kernel module is used in the bottleneck. The structure enlarges the receptive field and preserves shallow semantic information, which overcomes the performance limitation arising from uneven propagation in Feature Pyramid Networks (FPN). In addition, a multi-branch downsampling method based on depthwise separable convolutions and a bottleneck structure with deformable convolutions are designed to reduce the complexity of the backbone network and minimize the real-time performance loss caused by the increased model complexity. The experimental results on PASCAL VOC and MS COCO datasets show that, compared with YOLOv5s, MFF-YOLO reduces the number of parameters by 7% and the number of FLoating point Operations Per second (FLOPs) by 11.8%. The mAP@0.5 has improved by 3.7% and 5.5%, and the mAP@0.5:0.95 has improved by 6.5% and 6.2%, respetively. Furthermore, compared with YOLOv7-tiny, PP-YOLO-tiny, and other mainstream methods, MFF-YOLO has achieved better results on multiple indicators.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2097-2113"},"PeriodicalIF":3.5000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979796","citationCount":"0","resultStr":"{\"title\":\"MFF-YOLO: An Improved YOLO Algorithm Based on Multi-Scale Semantic Feature Fusion\",\"authors\":\"Junsan Zhang;Chenyang Xu;Shigen Shen;Jie Zhu;Peiying Zhang\",\"doi\":\"10.26599/TST.2024.9010097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The YOLOv5 algorithm is widely used in edge computing systems for object detection. However, the limited computing resources of embedded devices and the large model size of existing deep learning based methods increase the difficulty of real-time object detection on edge devices. To address this issue, we propose a smaller, less computationally intensive, and more accurate algorithm for object detection. Multi-scale Feature Fusion-YOLO (MFF-YOLO) is built on top of the YOLOv5s framework, but it contains substantial improvements to YOLOv5s. First, we design the MFF module to improve the feature propagation path in the feature pyramid, which further integrates the semantic information from different paths of feature layers. Then, a large convolution-kernel module is used in the bottleneck. The structure enlarges the receptive field and preserves shallow semantic information, which overcomes the performance limitation arising from uneven propagation in Feature Pyramid Networks (FPN). In addition, a multi-branch downsampling method based on depthwise separable convolutions and a bottleneck structure with deformable convolutions are designed to reduce the complexity of the backbone network and minimize the real-time performance loss caused by the increased model complexity. The experimental results on PASCAL VOC and MS COCO datasets show that, compared with YOLOv5s, MFF-YOLO reduces the number of parameters by 7% and the number of FLoating point Operations Per second (FLOPs) by 11.8%. The mAP@0.5 has improved by 3.7% and 5.5%, and the mAP@0.5:0.95 has improved by 6.5% and 6.2%, respetively. Furthermore, compared with YOLOv7-tiny, PP-YOLO-tiny, and other mainstream methods, MFF-YOLO has achieved better results on multiple indicators.\",\"PeriodicalId\":48690,\"journal\":{\"name\":\"Tsinghua Science and Technology\",\"volume\":\"30 5\",\"pages\":\"2097-2113\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979796\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Tsinghua Science and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10979796/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tsinghua Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10979796/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}

引用次数: 0

摘要

YOLOv5算法广泛应用于边缘计算系统中进行目标检测。然而，嵌入式设备有限的计算资源和现有基于深度学习的方法的大模型尺寸增加了边缘设备上实时目标检测的难度。为了解决这个问题，我们提出了一个更小、计算量更少、更准确的目标检测算法。多尺度特征融合- yolo （MFF-YOLO）是建立在YOLOv5s框架之上的，但它包含了对YOLOv5s的实质性改进。首先，我们设计了MFF模块，改进了特征金字塔中的特征传播路径，进一步整合了特征层不同路径的语义信息。然后，在瓶颈中使用了一个大的卷积核模块。该结构扩大了接收野并保留了浅层语义信息，克服了特征金字塔网络（FPN）中传播不均匀所带来的性能限制。此外，设计了基于深度可分离卷积的多分支下采样方法和具有可变形卷积的瓶颈结构，以降低骨干网的复杂性，最大限度地降低模型复杂性增加带来的实时性能损失。在PASCAL VOC和MS COCO数据集上的实验结果表明，与YOLOv5s相比，MFF-YOLO减少了7%的参数个数，每秒浮点运算次数（FLOPs）减少了11.8%。mAP@0.5分别提高3.7%和5.5%,mAP@0.5:0.95分别提高6.5%和6.2%。此外，与YOLOv7-tiny、pp - yoloo -tiny等主流方法相比，MFF-YOLO在多个指标上都取得了更好的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MFF-YOLO: An Improved YOLO Algorithm Based on Multi-Scale Semantic Feature Fusion

The YOLOv5 algorithm is widely used in edge computing systems for object detection. However, the limited computing resources of embedded devices and the large model size of existing deep learning based methods increase the difficulty of real-time object detection on edge devices. To address this issue, we propose a smaller, less computationally intensive, and more accurate algorithm for object detection. Multi-scale Feature Fusion-YOLO (MFF-YOLO) is built on top of the YOLOv5s framework, but it contains substantial improvements to YOLOv5s. First, we design the MFF module to improve the feature propagation path in the feature pyramid, which further integrates the semantic information from different paths of feature layers. Then, a large convolution-kernel module is used in the bottleneck. The structure enlarges the receptive field and preserves shallow semantic information, which overcomes the performance limitation arising from uneven propagation in Feature Pyramid Networks (FPN). In addition, a multi-branch downsampling method based on depthwise separable convolutions and a bottleneck structure with deformable convolutions are designed to reduce the complexity of the backbone network and minimize the real-time performance loss caused by the increased model complexity. The experimental results on PASCAL VOC and MS COCO datasets show that, compared with YOLOv5s, MFF-YOLO reduces the number of parameters by 7% and the number of FLoating point Operations Per second (FLOPs) by 11.8%. The mAP@0.5 has improved by 3.7% and 5.5%, and the mAP@0.5:0.95 has improved by 6.5% and 6.2%, respetively. Furthermore, compared with YOLOv7-tiny, PP-YOLO-tiny, and other mainstream methods, MFF-YOLO has achieved better results on multiple indicators.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Tsinghua Science and Technology COMPUTER SCIENCE, INFORMATION SYSTEMSCOMPU-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

10.20

自引率

10.60%

发文量

2340

期刊介绍： Tsinghua Science and Technology (Tsinghua Sci Technol) started publication in 1996. It is an international academic journal sponsored by Tsinghua University and is published bimonthly. This journal aims at presenting the up-to-date scientific achievements in computer science, electronic engineering, and other IT fields. Contributions all over the world are welcome.