Juan Wang , Hao Yang , Zizhen Zhang , Nan Zhao , Jixiang Shao , Minghua Wu , Zhigang Ma , Jialu Zhu , Xu An Wang , Haina Song
{"title":"Detection of moving small targets in infrared images for urban traffic monitoring","authors":"Juan Wang , Hao Yang , Zizhen Zhang , Nan Zhao , Jixiang Shao , Minghua Wu , Zhigang Ma , Jialu Zhu , Xu An Wang , Haina Song","doi":"10.1016/j.iot.2025.101673","DOIUrl":null,"url":null,"abstract":"<div><div>The Internet of Vehicles (IoV) and autonomous driving technologies require increasingly robust object detection capabilities, especially for small objects. However, reliably detecting small objects in urban traffic scenarios remains technically challenging under adverse weather conditions, including low illumination, rain, and snow. To address these challenges, we propose a fused IR–visible imaging approach using an enhanced YOLOv9 architecture. The proposed method employs a dual-branch semantic enhancement architecture, which achieves dynamic inter-modal feature weighting through a channel attention mechanism. The visible branch preserves texture details, while the infrared branch extracts thermal radiation characteristics, followed by multi-scale feature-level fusion. Firstly, we present UR-YOLO designed for detecting small targets in urban traffic environments. Secondly, we propose a novel DeeperFuse module that incorporates dual-branch semantic enhancement and channel attention mechanisms for effective multimodal feature fusion. Finally, by jointly optimizing fusion and detection losses, the method preserves critical details, enhances clarity and contrast. Experimental evaluation on the M<sup>\\relax \\special {t4ht=<sup>3</sup>}</sup>FD dataset demonstrates improved detection performance relative to the baseline YOLOv9 model. The results show an increase of 1.4 percentage points in mAP (from 83.3% to 84.7%) and 2.2 percentage points in <span><math><mrow><mi>A</mi><msub><mrow><mi>P</mi></mrow><mrow><mi>s</mi><mi>m</mi><mi>a</mi><mi>l</mi><mi>l</mi></mrow></msub></mrow></math></span> (from 51.6% to 53.8%). Furthermore, our method achieves real-time processing at 30 FPS, making it suitable for deployment in urban autonomous driving scenarios. Future work will focus on enhancing model performance via multimodal fusion, lightweight design, and multi-scale feature learning. We will also develop diverse datasets to advance autonomous driving perception in complex environments.</div></div>","PeriodicalId":29968,"journal":{"name":"Internet of Things","volume":"33 ","pages":"Article 101673"},"PeriodicalIF":6.0000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2542660525001878","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The Internet of Vehicles (IoV) and autonomous driving technologies require increasingly robust object detection capabilities, especially for small objects. However, reliably detecting small objects in urban traffic scenarios remains technically challenging under adverse weather conditions, including low illumination, rain, and snow. To address these challenges, we propose a fused IR–visible imaging approach using an enhanced YOLOv9 architecture. The proposed method employs a dual-branch semantic enhancement architecture, which achieves dynamic inter-modal feature weighting through a channel attention mechanism. The visible branch preserves texture details, while the infrared branch extracts thermal radiation characteristics, followed by multi-scale feature-level fusion. Firstly, we present UR-YOLO designed for detecting small targets in urban traffic environments. Secondly, we propose a novel DeeperFuse module that incorporates dual-branch semantic enhancement and channel attention mechanisms for effective multimodal feature fusion. Finally, by jointly optimizing fusion and detection losses, the method preserves critical details, enhances clarity and contrast. Experimental evaluation on the M\relax \special {t4ht=3}FD dataset demonstrates improved detection performance relative to the baseline YOLOv9 model. The results show an increase of 1.4 percentage points in mAP (from 83.3% to 84.7%) and 2.2 percentage points in (from 51.6% to 53.8%). Furthermore, our method achieves real-time processing at 30 FPS, making it suitable for deployment in urban autonomous driving scenarios. Future work will focus on enhancing model performance via multimodal fusion, lightweight design, and multi-scale feature learning. We will also develop diverse datasets to advance autonomous driving perception in complex environments.
期刊介绍:
Internet of Things; Engineering Cyber Physical Human Systems is a comprehensive journal encouraging cross collaboration between researchers, engineers and practitioners in the field of IoT & Cyber Physical Human Systems. The journal offers a unique platform to exchange scientific information on the entire breadth of technology, science, and societal applications of the IoT.
The journal will place a high priority on timely publication, and provide a home for high quality.
Furthermore, IOT is interested in publishing topical Special Issues on any aspect of IOT.