Ali Khan;Somaiya Khan;Mohammed A. M. Elhassan;Izhar Ahmed Khan;Hai Deng;Mohammed Alsuhaibani
{"title":"VDXNet:一种新的轻型航空图像车辆检测深度学习模型","authors":"Ali Khan;Somaiya Khan;Mohammed A. M. Elhassan;Izhar Ahmed Khan;Hai Deng;Mohammed Alsuhaibani","doi":"10.1109/LGRS.2025.3558423","DOIUrl":null,"url":null,"abstract":"In intelligent transportation systems (ITSs), real-time vehicle detection based on aerial images is crucial for effective traffic monitoring and decision-making. However, detecting small vehicles with varying orientations in complex backgrounds remains technically challenging, as existing models often struggle to balance the requirements of detection accuracy and computational efficiency. In this letter, we introduce the vehicle detection eXtended network (VDXNet), a lightweight model that is capable of achieving high detection performance while minimizing computational complexity. VDXNet incorporates the novel residual cross depth fusion (RxDF) module to enhance feature extraction in the backbone. Furthermore, it uses newly proposed lightweight feature pyramid pooling (LiteFPP) and channel reduction downsampling (CRDown) modules to support multiscale detection and spatial dimensionality reduction. These innovations streamline the model’s neck, reducing complexity while ensuring accurate detection of vehicles across diverse scales, angles, and backgrounds. Evaluations on the UCAS-AOD, VEDAI, UAV-ROD, and UAVDT datasets demonstrate that VDXNet achieves substantial reductions in model complexity, with 1.608M parameters (a decrease of 37.72%) and 5.9 GFLOPs (a decrease of 6.35%) compared with the YOLO11n model. Despite these efficiency gains, VDXNet also improves mAP by 0.52%, achieving 96.3% mAP on the UCAS_AOD dataset.","PeriodicalId":91017,"journal":{"name":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","volume":"22 ","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VDXNet: A Novel Lightweight Deep Learning Model for Vehicle Detection With Aerial Images\",\"authors\":\"Ali Khan;Somaiya Khan;Mohammed A. M. Elhassan;Izhar Ahmed Khan;Hai Deng;Mohammed Alsuhaibani\",\"doi\":\"10.1109/LGRS.2025.3558423\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In intelligent transportation systems (ITSs), real-time vehicle detection based on aerial images is crucial for effective traffic monitoring and decision-making. However, detecting small vehicles with varying orientations in complex backgrounds remains technically challenging, as existing models often struggle to balance the requirements of detection accuracy and computational efficiency. In this letter, we introduce the vehicle detection eXtended network (VDXNet), a lightweight model that is capable of achieving high detection performance while minimizing computational complexity. VDXNet incorporates the novel residual cross depth fusion (RxDF) module to enhance feature extraction in the backbone. Furthermore, it uses newly proposed lightweight feature pyramid pooling (LiteFPP) and channel reduction downsampling (CRDown) modules to support multiscale detection and spatial dimensionality reduction. These innovations streamline the model’s neck, reducing complexity while ensuring accurate detection of vehicles across diverse scales, angles, and backgrounds. Evaluations on the UCAS-AOD, VEDAI, UAV-ROD, and UAVDT datasets demonstrate that VDXNet achieves substantial reductions in model complexity, with 1.608M parameters (a decrease of 37.72%) and 5.9 GFLOPs (a decrease of 6.35%) compared with the YOLO11n model. Despite these efficiency gains, VDXNet also improves mAP by 0.52%, achieving 96.3% mAP on the UCAS_AOD dataset.\",\"PeriodicalId\":91017,\"journal\":{\"name\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"volume\":\"22 \",\"pages\":\"1-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10950435/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE geoscience and remote sensing letters : a publication of the IEEE Geoscience and Remote Sensing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10950435/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
VDXNet: A Novel Lightweight Deep Learning Model for Vehicle Detection With Aerial Images
In intelligent transportation systems (ITSs), real-time vehicle detection based on aerial images is crucial for effective traffic monitoring and decision-making. However, detecting small vehicles with varying orientations in complex backgrounds remains technically challenging, as existing models often struggle to balance the requirements of detection accuracy and computational efficiency. In this letter, we introduce the vehicle detection eXtended network (VDXNet), a lightweight model that is capable of achieving high detection performance while minimizing computational complexity. VDXNet incorporates the novel residual cross depth fusion (RxDF) module to enhance feature extraction in the backbone. Furthermore, it uses newly proposed lightweight feature pyramid pooling (LiteFPP) and channel reduction downsampling (CRDown) modules to support multiscale detection and spatial dimensionality reduction. These innovations streamline the model’s neck, reducing complexity while ensuring accurate detection of vehicles across diverse scales, angles, and backgrounds. Evaluations on the UCAS-AOD, VEDAI, UAV-ROD, and UAVDT datasets demonstrate that VDXNet achieves substantial reductions in model complexity, with 1.608M parameters (a decrease of 37.72%) and 5.9 GFLOPs (a decrease of 6.35%) compared with the YOLO11n model. Despite these efficiency gains, VDXNet also improves mAP by 0.52%, achieving 96.3% mAP on the UCAS_AOD dataset.