Lei Shi, Shuai Ren, Xing Fan, Ke Wang, Shan Lin, Zhanwen Liu
{"title":"Modified You Only Look Once Network Model for Enhanced Traffic Scene Detection Performance for Small Targets","authors":"Lei Shi, Shuai Ren, Xing Fan, Ke Wang, Shan Lin, Zhanwen Liu","doi":"10.1049/ipr2.70014","DOIUrl":null,"url":null,"abstract":"<p>In order to address the challenge of small target recognition in traffic scenes, we propose a model based on you only look once version 8X (Yolov8X) network model, which has been combined with receptive fields block (RFB) and multidimensional collaborative attention (MCA). First, the model employs the RFB to extract reliable and distinctive features, thereby enhancing the precision of small target identification. Furthermore, the MCA structure is introduced to simulate multidimensional attention through three parallel branches, thereby enhancing the feature expression ability of the model. This fragment describes a compression transformation and an excitation transformation that captures the differentiated feature representation of the command. These transformations facilitate the network's ability to locate and predict the location of small objects more accurately. Utilizing these transformations enhances the expressiveness and diversity of features, thereby improving the detection performance of small objects. Furthermore, data augmentation and hyperparameter optimization techniques are employed to enhance the model's generalisability. The validation results on the Argoverse 1.1 autonomous driving dataset demonstrate that the enhanced network model outperforms the prevailing detectors, achieving an F1 score of 78.6, an average precision of 55.1, and an average recall of 72.4. The algorithm's excellent performance for small target detection was demonstrated through visual analysis, proving its high application value and potential for promotion in fields such as autonomous driving.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70014","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Image Processing","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ipr2.70014","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In order to address the challenge of small target recognition in traffic scenes, we propose a model based on you only look once version 8X (Yolov8X) network model, which has been combined with receptive fields block (RFB) and multidimensional collaborative attention (MCA). First, the model employs the RFB to extract reliable and distinctive features, thereby enhancing the precision of small target identification. Furthermore, the MCA structure is introduced to simulate multidimensional attention through three parallel branches, thereby enhancing the feature expression ability of the model. This fragment describes a compression transformation and an excitation transformation that captures the differentiated feature representation of the command. These transformations facilitate the network's ability to locate and predict the location of small objects more accurately. Utilizing these transformations enhances the expressiveness and diversity of features, thereby improving the detection performance of small objects. Furthermore, data augmentation and hyperparameter optimization techniques are employed to enhance the model's generalisability. The validation results on the Argoverse 1.1 autonomous driving dataset demonstrate that the enhanced network model outperforms the prevailing detectors, achieving an F1 score of 78.6, an average precision of 55.1, and an average recall of 72.4. The algorithm's excellent performance for small target detection was demonstrated through visual analysis, proving its high application value and potential for promotion in fields such as autonomous driving.
为了解决交通场景中的小目标识别难题,我们提出了一种基于 You only look once version 8X (Yolov8X) 网络模型的模型,该模型与感受野块(RFB)和多维协同注意(MCA)相结合。首先,该模型利用 RFB 提取可靠而独特的特征,从而提高了小目标识别的精度。此外,还引入了 MCA 结构,通过三个并行分支模拟多维注意,从而增强了模型的特征表达能力。该片段描述了一种压缩变换和一种激励变换,以捕捉命令的差异化特征表示。这些变换有助于网络更准确地定位和预测小物体的位置。利用这些变换可以增强特征的表现力和多样性,从而提高小型物体的检测性能。此外,还采用了数据增强和超参数优化技术来增强模型的通用性。Argoverse 1.1 自动驾驶数据集的验证结果表明,增强型网络模型优于现有的检测器,F1 得分为 78.6,平均精确度为 55.1,平均召回率为 72.4。通过可视化分析,该算法在小型目标检测方面的卓越性能得到了验证,证明了其在自动驾驶等领域的高应用价值和推广潜力。
期刊介绍:
The IET Image Processing journal encompasses research areas related to the generation, processing and communication of visual information. The focus of the journal is the coverage of the latest research results in image and video processing, including image generation and display, enhancement and restoration, segmentation, colour and texture analysis, coding and communication, implementations and architectures as well as innovative applications.
Principal topics include:
Generation and Display - Imaging sensors and acquisition systems, illumination, sampling and scanning, quantization, colour reproduction, image rendering, display and printing systems, evaluation of image quality.
Processing and Analysis - Image enhancement, restoration, segmentation, registration, multispectral, colour and texture processing, multiresolution processing and wavelets, morphological operations, stereoscopic and 3-D processing, motion detection and estimation, video and image sequence processing.
Implementations and Architectures - Image and video processing hardware and software, design and construction, architectures and software, neural, adaptive, and fuzzy processing.
Coding and Transmission - Image and video compression and coding, compression standards, noise modelling, visual information networks, streamed video.
Retrieval and Multimedia - Storage of images and video, database design, image retrieval, video annotation and editing, mixed media incorporating visual information, multimedia systems and applications, image and video watermarking, steganography.
Applications - Innovative application of image and video processing technologies to any field, including life sciences, earth sciences, astronomy, document processing and security.
Current Special Issue Call for Papers:
Evolutionary Computation for Image Processing - https://digital-library.theiet.org/files/IET_IPR_CFP_EC.pdf
AI-Powered 3D Vision - https://digital-library.theiet.org/files/IET_IPR_CFP_AIPV.pdf
Multidisciplinary advancement of Imaging Technologies: From Medical Diagnostics and Genomics to Cognitive Machine Vision, and Artificial Intelligence - https://digital-library.theiet.org/files/IET_IPR_CFP_IST.pdf
Deep Learning for 3D Reconstruction - https://digital-library.theiet.org/files/IET_IPR_CFP_DLR.pdf