MSYOLOF:多输入-单输出编码器网络与三方特征增强的目标检测

Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems Pub Date : 2023-07-28 DOI:10.1145/3609703.3609710

Gong Cheng, Xi Yong, Xin Lyu, Tao Zeng, Xinyu Wang, Jiale Chen, Xin Li

{"title":"MSYOLOF:多输入-单输出编码器网络与三方特征增强的目标检测","authors":"Gong Cheng, Xi Yong, Xin Lyu, Tao Zeng, Xinyu Wang, Jiale Chen, Xin Li","doi":"10.1145/3609703.3609710","DOIUrl":null,"url":null,"abstract":"Object detection under one-level feature is a challenging task, which requires that object representations at different scales can be extracted on a single feature map. However, existing object detectors using a one-level feature suffer from inadequate of different-scale object representations resulting in low accuracy for multi-scale object detection, especially for smaller objects. To address the problem above-mentioned, a novel object detector named MSYOLOF, is proposed to construct an effective single feature map for detecting objects of different scales. In the proposed network, three modules are proposed to bring considerable improvements, namely Feature Pyramid Pooling (FPP), Feature Perception Enhancement (FPE), and Dual Branch Receptive Field (DBRF). Firstly, the FPP module aggregates contextual information from various regions to improve the network's ability to achieve global information, which strengthens the model's understanding of the overall scene. Then, the FPE module utilizes coordinate attention to construct a residual block to obtain orientation-aware and position-sensitive information, making the network efficient in accurately locating and identifying objects of interest. Third, by rethinking the Dilated Encoder of YOLOF, the DBRF module reduces information loss and mitigates the problem of being sensitive only to large objects when dilated convolution utilizes large expansion rates. Extensive experiments are conducted on COCO benchmark to validate the effectiveness of our network, which exhibits superior performance compared to other state-of-the-art networks.","PeriodicalId":101485,"journal":{"name":"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MSYOLOF: Multi-input-single-output encoder network with tripartite feature enhancement for object detection\",\"authors\":\"Gong Cheng, Xi Yong, Xin Lyu, Tao Zeng, Xinyu Wang, Jiale Chen, Xin Li\",\"doi\":\"10.1145/3609703.3609710\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Object detection under one-level feature is a challenging task, which requires that object representations at different scales can be extracted on a single feature map. However, existing object detectors using a one-level feature suffer from inadequate of different-scale object representations resulting in low accuracy for multi-scale object detection, especially for smaller objects. To address the problem above-mentioned, a novel object detector named MSYOLOF, is proposed to construct an effective single feature map for detecting objects of different scales. In the proposed network, three modules are proposed to bring considerable improvements, namely Feature Pyramid Pooling (FPP), Feature Perception Enhancement (FPE), and Dual Branch Receptive Field (DBRF). Firstly, the FPP module aggregates contextual information from various regions to improve the network's ability to achieve global information, which strengthens the model's understanding of the overall scene. Then, the FPE module utilizes coordinate attention to construct a residual block to obtain orientation-aware and position-sensitive information, making the network efficient in accurately locating and identifying objects of interest. Third, by rethinking the Dilated Encoder of YOLOF, the DBRF module reduces information loss and mitigates the problem of being sensitive only to large objects when dilated convolution utilizes large expansion rates. Extensive experiments are conducted on COCO benchmark to validate the effectiveness of our network, which exhibits superior performance compared to other state-of-the-art networks.\",\"PeriodicalId\":101485,\"journal\":{\"name\":\"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3609703.3609710\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609703.3609710","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

单级特征下的目标检测是一项具有挑战性的任务，它要求在单个特征图上提取不同尺度的目标表示。然而，现有的单级特征目标检测器由于缺乏不同尺度的目标表示，导致多尺度目标检测精度低，尤其是对较小的目标检测精度低。为了解决上述问题，提出了一种新的目标检测器MSYOLOF，用于构建有效的单特征映射来检测不同尺度的目标。在该网络中，提出了三个模块，即特征金字塔池(FPP)、特征感知增强(FPE)和双分支接受野(DBRF)，带来了相当大的改进。首先，FPP模块聚合来自各个区域的上下文信息，提高网络获取全局信息的能力，增强模型对整体场景的理解。然后，FPE模块利用坐标注意构造残差块，获得方向感知和位置敏感信息，使网络能够高效准确地定位和识别感兴趣的目标。第三，通过重新考虑YOLOF的扩展编码器，DBRF模块减少了信息丢失，并缓解了当扩展卷积使用大扩展速率时仅对大对象敏感的问题。在COCO基准上进行了大量的实验，以验证我们的网络的有效性，与其他最先进的网络相比，它表现出优越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MSYOLOF: Multi-input-single-output encoder network with tripartite feature enhancement for object detection

Object detection under one-level feature is a challenging task, which requires that object representations at different scales can be extracted on a single feature map. However, existing object detectors using a one-level feature suffer from inadequate of different-scale object representations resulting in low accuracy for multi-scale object detection, especially for smaller objects. To address the problem above-mentioned, a novel object detector named MSYOLOF, is proposed to construct an effective single feature map for detecting objects of different scales. In the proposed network, three modules are proposed to bring considerable improvements, namely Feature Pyramid Pooling (FPP), Feature Perception Enhancement (FPE), and Dual Branch Receptive Field (DBRF). Firstly, the FPP module aggregates contextual information from various regions to improve the network's ability to achieve global information, which strengthens the model's understanding of the overall scene. Then, the FPE module utilizes coordinate attention to construct a residual block to obtain orientation-aware and position-sensitive information, making the network efficient in accurately locating and identifying objects of interest. Third, by rethinking the Dilated Encoder of YOLOF, the DBRF module reduces information loss and mitigates the problem of being sensitive only to large objects when dilated convolution utilizes large expansion rates. Extensive experiments are conducted on COCO benchmark to validate the effectiveness of our network, which exhibits superior performance compared to other state-of-the-art networks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems

自引率

0.00%

发文量