结合旋转盒和注意机制的遥感图像轻量化目标检测模型

Q3 Computer Science

中国图象图形学报 Pub Date : 2023-01-01 DOI:10.11834/jig.220839

Li Zhaohui, An Jintang, Jia Hongyu, Fang Yan

引用次数: 0

摘要

目的遥感图像目标检测在国防安全、智能监测等领域扮演着重要的角色。面对遥感图像中排列密集且方向任意分布的目标,传统水平框目标检测不能实现精细定位,大型和超大型的目标检测网络虽然有强大表征学习能力,但是忽略了模型准确率与计算量、参数量之间的性价比,也满足不了实时检测的要求,庞大的参数量和计算量在模型部署上也非常受限,针对以上问题,设计了一种轻量级的旋转框遥感图像目标检测模型(YOLO-RMV4)。方法对原 MobileNetv3 网络进行改进,在特征提取网络中加入性能更好的通道注意力机制模块(efficient channelattention,ECA),并且对网络规模进行适当扩展,同时加入路径聚合网络(path aggregation network,PANet),对主干网络提取特征进行多尺度融合,为网络提供更丰富可靠的目标特征。网络检测头中则采用多尺度检测技术,来应对不同尺寸的目标物体,检测头中的角度预测加入了环形圆滑标签(circular smooth label,CSL),将角度回归问题转换为分类问题,从而使预测角度和真实角度之间的距离可以衡量。结果将提出的检测模型在制备的 AVSP(aerialimages of vehicle ship and plane)数据集上进行实验验证,并对主流的 7 种轻量级网络模型进行了对比实验,相比RYOLOv5l,该模型大小(5.3 MB)仅为 RYOLOv5(l 45.3 MB)的 1/8,平均精度均值(mean average precision,mAP)提高了 1.2%,平均召回率(average recall,AR)提高了 1.6%。并且 mAP 和 AR 均远高于其他的轻量级网络模型。同时也对各个改进模块进行了消融实验,验证了不同模块对模型性能的提升程度。结论本文提出的模型在轻量的网络结构下辅以多尺度融合和旋转框检测,使该模型在极有限参数量下实现实时推理和高精度检测。;Objective Remote sensing image object detection plays an important role in military security, maritime traffic supervision, intelligent monitoring, and other fields.Remote sensing images are different from natural images.Most remote sensing images are taken at altitudes ranging from several kilometers to tens of thousands of meters.Therefore, the scale of target objects in remote sensing images is large.Most of the target objects are small, such as small vehicles.The other target objects are huge, such as ships.The angles of the objects in the remote sensing images are distributed arbitrarily because of the shooting angle.Therefore, this scenario is a huge challenge for the feature extraction network in remote sensing image target detection, particularly in complex backgrounds.Given the continuous improvement in the computing power of hardware devices and the rapid development of deep learning theory, large and ultralarge object detection networks have been continuously proposed in recent years to improve detection accuracy.Although these detection networks have strong representation learning capabilities, they ignore the cost-effectiveness gained from the relationship of detection accuracy with model calculation amount and the number of parameters.Moreover, real-time detection requirements are difficult to achieve, and the number of parameters and amount of calculation are very limited in model deployment.In addition, most of the general target detection models are designed for natural field datasets.The detection effect in remote sensing image target detection is unsatisfactory, particularly for densely arranged objects.The traditional horizontal box object detection cannot achieve precise detection, such as ships in port and cars in parking lots.Aiming at the above problems, a lightweight rotating box remote sensing image object detection model (YOLO-RMV4)is designed.Method In the experiment, the open-source datasets DOTA2.0, FAIR1M, and HRSC2016 are used as the basic datasets.Moreover, four common vehicles, including a ship, a plane, a small vehicle, and a large vehicle, are selected as objects.A aerial images of vehicle ship and plane(AVSP)dataset is prepared after preprocesses, such as filtering, segmentation, conversion, and relabeling, are performed.This dataset contains 19 406 images of 1 024×1 024 and 637 466 object instances.The AVSP data labels are divided into HBB and OBB(HBB is the horizontal box annotation, and OBB is the rotating box annotation), where OBB is represented by eight parameters.YOLO-RMV4 is improved based on the MobileNetv3 network.Adding an efficient channel attention(ECA)mechanism module with excellent performance in the feature extraction network, appropriately expanding the network scale, adding the SPPF module after the feature extraction network, and adding the path aggregation network(PANet)result in multiscale fusion of the extracted features of the backbone network, thereby providing the network with rich and reliable target features.In the network detection head, multiscale detection technology is used to deal with target objects of different sizes.More than half of the objects in the dataset are small targets.Thus, the detection after four times of downsampling is added, resulting in 4, 8, 16, and 32 times of downsampling.Moreover, the small target loss is given a high weight.The smooth circular label is added to the angle prediction in the detection head, which converts the angle regression problem into a classification problem.Thus, the distance between the predicted angle and the real angle can be measured, and the angle periodicity problem is solved.This scenario results in a precise bounding box positioning.Moreover, the anchor size is designed according to the characteristics of the dataset.We use random cropping, flipping, mosaic technique, and other data augmentation approaches in the training.Result In this study, we conduct comparative experiments, and ablation experiments are carried out on the AVSP dataset.We also conduct comparative experiments on seven mainstream lightweight network models to verify the effectiveness of the model. 我们使用平均召回率(AR)，平均平均精度(mAP)，参数计数和检测速度(每秒帧数，FPS)作为评估指标。并对各模型的mAP、AR、FPS等参数进行了比较。YOLO-RMV4(5.3 M)的体积仅为RYOLOv5l(45.3 M)的1/8，与RYOLOv5l的mAP和AR相比，YOLO-RMV4的mAP和AR分别增加了1.2%和1.6%。此外，YOLO-RMV4的mAP和AR远高于其他轻量级网络模型(EfficientNet和ShuffleNet)。我们还对YOLO-RMV4进行压缩和剪枝，得到大小仅为4.5 m的YOLO-RMV4S，在检测精度和召回率方面也优于常见的轻量级网络模型。对每个改进模块进行烧蚀实验，验证不同模块对模型性能的改善程度。加入PANet后，mAP增加了8.4%。PANet融合了不同层的特征。这种现象很大程度上弥补了轻量级网络特征提取能力有限的缺陷。加入旋转检测头后，mAP提高了16.8%，大大提高了模型的检测性能。增加ECA模块后，mAP增加1.6%。ECA模块可以准确地激励骨干特征提取网络，利用有限的容量和有限的参数，学习目标对象的特征信息。在增加4次降采样后，mAP增加3.0%。四次降采样的加入大大提高了小目标的性能。其中一个模块也基于YOLO-RMV4被淘汰。比较模型的性能退化程度，以反映模型中每个模块的独特作用。最后，对各个类别的检测精度进行了分析。飞机的mAP和AR是最高的。船舶和大型车辆的能量是第二，而小型车辆的能量是最低的。结论YOLO-RMV4在轻量化网络结构下辅以多尺度融合和旋转盒检测。因此，该模型可以在极其有限的参数下实现实时推理和高精度检测，具有很高的成本效益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Lightweight object detection model in remote sensing image by combining rotation box and attention mechanism

目的遥感图像目标检测在国防安全、智能监测等领域扮演着重要的角色。面对遥感图像中排列密集且方向任意分布的目标,传统水平框目标检测不能实现精细定位,大型和超大型的目标检测网络虽然有强大表征学习能力,但是忽略了模型准确率与计算量、参数量之间的性价比,也满足不了实时检测的要求,庞大的参数量和计算量在模型部署上也非常受限,针对以上问题,设计了一种轻量级的旋转框遥感图像目标检测模型(YOLO-RMV4)。方法对原 MobileNetv3 网络进行改进,在特征提取网络中加入性能更好的通道注意力机制模块(efficient channelattention,ECA),并且对网络规模进行适当扩展,同时加入路径聚合网络(path aggregation network,PANet),对主干网络提取特征进行多尺度融合,为网络提供更丰富可靠的目标特征。网络检测头中则采用多尺度检测技术,来应对不同尺寸的目标物体,检测头中的角度预测加入了环形圆滑标签(circular smooth label,CSL),将角度回归问题转换为分类问题,从而使预测角度和真实角度之间的距离可以衡量。结果将提出的检测模型在制备的 AVSP(aerialimages of vehicle ship and plane)数据集上进行实验验证,并对主流的 7 种轻量级网络模型进行了对比实验,相比RYOLOv5l,该模型大小(5.3 MB)仅为 RYOLOv5(l 45.3 MB)的 1/8,平均精度均值(mean average precision,mAP)提高了 1.2%,平均召回率(average recall,AR)提高了 1.6%。并且 mAP 和 AR 均远高于其他的轻量级网络模型。同时也对各个改进模块进行了消融实验,验证了不同模块对模型性能的提升程度。结论本文提出的模型在轻量的网络结构下辅以多尺度融合和旋转框检测,使该模型在极有限参数量下实现实时推理和高精度检测。;Objective Remote sensing image object detection plays an important role in military security, maritime traffic supervision, intelligent monitoring, and other fields.Remote sensing images are different from natural images.Most remote sensing images are taken at altitudes ranging from several kilometers to tens of thousands of meters.Therefore, the scale of target objects in remote sensing images is large.Most of the target objects are small, such as small vehicles.The other target objects are huge, such as ships.The angles of the objects in the remote sensing images are distributed arbitrarily because of the shooting angle.Therefore, this scenario is a huge challenge for the feature extraction network in remote sensing image target detection, particularly in complex backgrounds.Given the continuous improvement in the computing power of hardware devices and the rapid development of deep learning theory, large and ultralarge object detection networks have been continuously proposed in recent years to improve detection accuracy.Although these detection networks have strong representation learning capabilities, they ignore the cost-effectiveness gained from the relationship of detection accuracy with model calculation amount and the number of parameters.Moreover, real-time detection requirements are difficult to achieve, and the number of parameters and amount of calculation are very limited in model deployment.In addition, most of the general target detection models are designed for natural field datasets.The detection effect in remote sensing image target detection is unsatisfactory, particularly for densely arranged objects.The traditional horizontal box object detection cannot achieve precise detection, such as ships in port and cars in parking lots.Aiming at the above problems, a lightweight rotating box remote sensing image object detection model (YOLO-RMV4)is designed.Method In the experiment, the open-source datasets DOTA2.0, FAIR1M, and HRSC2016 are used as the basic datasets.Moreover, four common vehicles, including a ship, a plane, a small vehicle, and a large vehicle, are selected as objects.A aerial images of vehicle ship and plane(AVSP)dataset is prepared after preprocesses, such as filtering, segmentation, conversion, and relabeling, are performed.This dataset contains 19 406 images of 1 024×1 024 and 637 466 object instances.The AVSP data labels are divided into HBB and OBB(HBB is the horizontal box annotation, and OBB is the rotating box annotation), where OBB is represented by eight parameters.YOLO-RMV4 is improved based on the MobileNetv3 network.Adding an efficient channel attention(ECA)mechanism module with excellent performance in the feature extraction network, appropriately expanding the network scale, adding the SPPF module after the feature extraction network, and adding the path aggregation network(PANet)result in multiscale fusion of the extracted features of the backbone network, thereby providing the network with rich and reliable target features.In the network detection head, multiscale detection technology is used to deal with target objects of different sizes.More than half of the objects in the dataset are small targets.Thus, the detection after four times of downsampling is added, resulting in 4, 8, 16, and 32 times of downsampling.Moreover, the small target loss is given a high weight.The smooth circular label is added to the angle prediction in the detection head, which converts the angle regression problem into a classification problem.Thus, the distance between the predicted angle and the real angle can be measured, and the angle periodicity problem is solved.This scenario results in a precise bounding box positioning.Moreover, the anchor size is designed according to the characteristics of the dataset.We use random cropping, flipping, mosaic technique, and other data augmentation approaches in the training.Result In this study, we conduct comparative experiments, and ablation experiments are carried out on the AVSP dataset.We also conduct comparative experiments on seven mainstream lightweight network models to verify the effectiveness of the model.we used average recall(AR), mean average precision(mAP), parameter count, and detection speed(frames per second, FPS)as evaluation metrics.Each model's parameters, such as mAP, AR, and FPS, are also compared.The size of YOLO-RMV4(5.3 M)is only 1/8 of that of RYOLOv5l(45.3 M).Compared with the mAP and AR of RYOLOv5l, those of YOLO-RMV4 are increased by 1.2% and 1.6%, respectively.Moreover, the mAP and AR of YOLO-RMV4 are much higher than those of other lightweight network models(EfficientNet and ShuffleNet).We also compress and prune YOLO-RMV4 to obtain YOLO-RMV4S, whose size is only 4.5 M.YOLO-RMV4S is also better than common lightweight network models in terms of detection precision and recall.Ablation experiments were also conducted on each improved module to verify the improvement degree of model performance by different modules.The mAP increases by 8.4% after the addition of PANet.PANet fuses the features of different layers.This phenomenon largely makes up for the defect of the limited feature extraction capability of the lightweight network.After the rotation detection head is added, the mAP increases by 16.8%, greatly increasing the detection performance of the model.After the ECA module is added, the mAP increases by 1.6%.The ECA module can accurately stimulate the backbone feature extraction network to utilize the limited capacity and the limited amount of parameters and learn the feature information of the target object.After the addition of four times of downsampling, the mAP increases by 3.0%.The addition of four times of downsampling greatly enhances the performance of small target objects.One of the modules is also eliminated based on YOLO-RMV4.The performance degradation degree of the model is compared to reflect the unique role of each module in the model.Finally, the detection accuracy of each category is analyzed.The mAP and AR of the plane are the highest.Those of the ship and the large vehicle are the second, whereas those of the small vehicle are the lowest.Conclusion YOLO-RMV4 is supplemented by multiscale fusion and rotating box detection under the lightweight network structure.Thus, the model can achieve real-time inference and high-precision detection under extremely limited parameters, thereby making it very cost-effective.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

中国图象图形学报 Computer Science-Computer Graphics and Computer-Aided Design

CiteScore

1.20

自引率

0.00%

发文量

6776

期刊介绍： Journal of Image and Graphics (ISSN 1006-8961, CN 11-3758/TB, CODEN ZTTXFZ) is an authoritative academic journal supervised by the Chinese Academy of Sciences and co-sponsored by the Institute of Space and Astronautical Information Innovation of the Chinese Academy of Sciences (ISIAS), the Chinese Society of Image and Graphics (CSIG), and the Beijing Institute of Applied Physics and Computational Mathematics (BIAPM). The journal integrates high-tech theories, technical methods and industrialisation of applied research results in computer image graphics, and mainly publishes innovative and high-level scientific research papers on basic and applied research in image graphics science and its closely related fields. The form of papers includes reviews, technical reports, project progress, academic news, new technology reviews, new product introduction and industrialisation research. The content covers a wide range of fields such as image analysis and recognition, image understanding and computer vision, computer graphics, virtual reality and augmented reality, system simulation, animation, etc., and theme columns are opened according to the research hotspots and cutting-edge topics. Journal of Image and Graphics reaches a wide range of readers, including scientific and technical personnel, enterprise supervisors, and postgraduates and college students of colleges and universities engaged in the fields of national defence, military, aviation, aerospace, communications, electronics, automotive, agriculture, meteorology, environmental protection, remote sensing, mapping, oil field, construction, transportation, finance, telecommunications, education, medical care, film and television, and art. Journal of Image and Graphics is included in many important domestic and international scientific literature database systems, including EBSCO database in the United States, JST database in Japan, Scopus database in the Netherlands, China Science and Technology Thesis Statistics and Analysis (Annual Research Report), China Science Citation Database (CSCD), China Academic Journal Network Publishing Database (CAJD), and China Academic Journal Network Publishing Database (CAJD). China Science Citation Database (CSCD), China Academic Journals Network Publishing Database (CAJD), China Academic Journal Abstracts, Chinese Science Abstracts (Series A), China Electronic Science Abstracts, Chinese Core Journals Abstracts, Chinese Academic Journals on CD-ROM, and China Academic Journals Comprehensive Evaluation Database.