{"title":"MFF-YOLO:基于多尺度语义特征融合的改进YOLO算法","authors":"Junsan Zhang;Chenyang Xu;Shigen Shen;Jie Zhu;Peiying Zhang","doi":"10.26599/TST.2024.9010097","DOIUrl":null,"url":null,"abstract":"The YOLOv5 algorithm is widely used in edge computing systems for object detection. However, the limited computing resources of embedded devices and the large model size of existing deep learning based methods increase the difficulty of real-time object detection on edge devices. To address this issue, we propose a smaller, less computationally intensive, and more accurate algorithm for object detection. Multi-scale Feature Fusion-YOLO (MFF-YOLO) is built on top of the YOLOv5s framework, but it contains substantial improvements to YOLOv5s. First, we design the MFF module to improve the feature propagation path in the feature pyramid, which further integrates the semantic information from different paths of feature layers. Then, a large convolution-kernel module is used in the bottleneck. The structure enlarges the receptive field and preserves shallow semantic information, which overcomes the performance limitation arising from uneven propagation in Feature Pyramid Networks (FPN). In addition, a multi-branch downsampling method based on depthwise separable convolutions and a bottleneck structure with deformable convolutions are designed to reduce the complexity of the backbone network and minimize the real-time performance loss caused by the increased model complexity. The experimental results on PASCAL VOC and MS COCO datasets show that, compared with YOLOv5s, MFF-YOLO reduces the number of parameters by 7% and the number of FLoating point Operations Per second (FLOPs) by 11.8%. The mAP@0.5 has improved by 3.7% and 5.5%, and the mAP@0.5:0.95 has improved by 6.5% and 6.2%, respetively. Furthermore, compared with YOLOv7-tiny, PP-YOLO-tiny, and other mainstream methods, MFF-YOLO has achieved better results on multiple indicators.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2097-2113"},"PeriodicalIF":6.6000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979796","citationCount":"0","resultStr":"{\"title\":\"MFF-YOLO: An Improved YOLO Algorithm Based on Multi-Scale Semantic Feature Fusion\",\"authors\":\"Junsan Zhang;Chenyang Xu;Shigen Shen;Jie Zhu;Peiying Zhang\",\"doi\":\"10.26599/TST.2024.9010097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The YOLOv5 algorithm is widely used in edge computing systems for object detection. However, the limited computing resources of embedded devices and the large model size of existing deep learning based methods increase the difficulty of real-time object detection on edge devices. To address this issue, we propose a smaller, less computationally intensive, and more accurate algorithm for object detection. Multi-scale Feature Fusion-YOLO (MFF-YOLO) is built on top of the YOLOv5s framework, but it contains substantial improvements to YOLOv5s. First, we design the MFF module to improve the feature propagation path in the feature pyramid, which further integrates the semantic information from different paths of feature layers. Then, a large convolution-kernel module is used in the bottleneck. The structure enlarges the receptive field and preserves shallow semantic information, which overcomes the performance limitation arising from uneven propagation in Feature Pyramid Networks (FPN). In addition, a multi-branch downsampling method based on depthwise separable convolutions and a bottleneck structure with deformable convolutions are designed to reduce the complexity of the backbone network and minimize the real-time performance loss caused by the increased model complexity. The experimental results on PASCAL VOC and MS COCO datasets show that, compared with YOLOv5s, MFF-YOLO reduces the number of parameters by 7% and the number of FLoating point Operations Per second (FLOPs) by 11.8%. The mAP@0.5 has improved by 3.7% and 5.5%, and the mAP@0.5:0.95 has improved by 6.5% and 6.2%, respetively. Furthermore, compared with YOLOv7-tiny, PP-YOLO-tiny, and other mainstream methods, MFF-YOLO has achieved better results on multiple indicators.\",\"PeriodicalId\":48690,\"journal\":{\"name\":\"Tsinghua Science and Technology\",\"volume\":\"30 5\",\"pages\":\"2097-2113\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2025-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979796\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Tsinghua Science and Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10979796/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tsinghua Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10979796/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
MFF-YOLO: An Improved YOLO Algorithm Based on Multi-Scale Semantic Feature Fusion
The YOLOv5 algorithm is widely used in edge computing systems for object detection. However, the limited computing resources of embedded devices and the large model size of existing deep learning based methods increase the difficulty of real-time object detection on edge devices. To address this issue, we propose a smaller, less computationally intensive, and more accurate algorithm for object detection. Multi-scale Feature Fusion-YOLO (MFF-YOLO) is built on top of the YOLOv5s framework, but it contains substantial improvements to YOLOv5s. First, we design the MFF module to improve the feature propagation path in the feature pyramid, which further integrates the semantic information from different paths of feature layers. Then, a large convolution-kernel module is used in the bottleneck. The structure enlarges the receptive field and preserves shallow semantic information, which overcomes the performance limitation arising from uneven propagation in Feature Pyramid Networks (FPN). In addition, a multi-branch downsampling method based on depthwise separable convolutions and a bottleneck structure with deformable convolutions are designed to reduce the complexity of the backbone network and minimize the real-time performance loss caused by the increased model complexity. The experimental results on PASCAL VOC and MS COCO datasets show that, compared with YOLOv5s, MFF-YOLO reduces the number of parameters by 7% and the number of FLoating point Operations Per second (FLOPs) by 11.8%. The mAP@0.5 has improved by 3.7% and 5.5%, and the mAP@0.5:0.95 has improved by 6.5% and 6.2%, respetively. Furthermore, compared with YOLOv7-tiny, PP-YOLO-tiny, and other mainstream methods, MFF-YOLO has achieved better results on multiple indicators.
期刊介绍:
Tsinghua Science and Technology (Tsinghua Sci Technol) started publication in 1996. It is an international academic journal sponsored by Tsinghua University and is published bimonthly. This journal aims at presenting the up-to-date scientific achievements in computer science, electronic engineering, and other IT fields. Contributions all over the world are welcome.