基于yolov5的多模态交互式监督行人检测

2023 3rd International Symposium on Computer Technology and Information Science (ISCTIS) Pub Date : 2023-07-07 DOI:10.1109/ISCTIS58954.2023.10213155

Mingyue Li, Lianzhong Wang, Zhe Zheng, WenpengCui Cui, Rui Liu, Yingying Chi

{"title":"基于yolov5的多模态交互式监督行人检测","authors":"Mingyue Li, Lianzhong Wang, Zhe Zheng, WenpengCui Cui, Rui Liu, Yingying Chi","doi":"10.1109/ISCTIS58954.2023.10213155","DOIUrl":null,"url":null,"abstract":"Pedestrian detection and recognition is one of the important and fundamental tasks of environmental awareness in autonomous driving. Some existing research methods are based on visual images (RGB) for detection, but detection methods that only use visual images cannot meet harsh detection environments, such as in cloudy, rainy, foggy, and poor lighting environments, where the detection effect is poor. In recent years, fusion detection based on visual images (RGB) and thermal images (Thermal) has received increasing attention, but there are still some problems in fusion strategies. The purpose of RGB Thermal pedestrian detection is to fuse complementary visible light images and thermal infrared information to improve the performance of pedestrian detection in day and night environments [1]. In recent years, many experts and scholars have done a lot of research work on this issue [2]–[6], effectively integrating the two modal data of RGB and thermal infrared images, and have achieved some results. However, these methods directly use fused features for pedestrian detection, without considering the problem that the quality of the resulting fused features may not be high. Therefore, multimodal feature fusion pedestrian detection structures need to improve the quality of fusion features, which plays a crucial role in the research of multimodal data fusion pedestrian detection in autonomous driving. The Cross-Modal Supervision (CMS) model designed and implemented in this article has been experimentally verified on the public Kaist [7] dataset. The experimental results show that the accuracy of the cross modal supervised model on the Kaist dataset reaches 53.68°/°and the miss rate decreases to 46.32°/°.","PeriodicalId":334790,"journal":{"name":"2023 3rd International Symposium on Computer Technology and Information Science (ISCTIS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal Interactive Supervised Pedestrian Detection Based on yolov5\",\"authors\":\"Mingyue Li, Lianzhong Wang, Zhe Zheng, WenpengCui Cui, Rui Liu, Yingying Chi\",\"doi\":\"10.1109/ISCTIS58954.2023.10213155\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pedestrian detection and recognition is one of the important and fundamental tasks of environmental awareness in autonomous driving. Some existing research methods are based on visual images (RGB) for detection, but detection methods that only use visual images cannot meet harsh detection environments, such as in cloudy, rainy, foggy, and poor lighting environments, where the detection effect is poor. In recent years, fusion detection based on visual images (RGB) and thermal images (Thermal) has received increasing attention, but there are still some problems in fusion strategies. The purpose of RGB Thermal pedestrian detection is to fuse complementary visible light images and thermal infrared information to improve the performance of pedestrian detection in day and night environments [1]. In recent years, many experts and scholars have done a lot of research work on this issue [2]–[6], effectively integrating the two modal data of RGB and thermal infrared images, and have achieved some results. However, these methods directly use fused features for pedestrian detection, without considering the problem that the quality of the resulting fused features may not be high. Therefore, multimodal feature fusion pedestrian detection structures need to improve the quality of fusion features, which plays a crucial role in the research of multimodal data fusion pedestrian detection in autonomous driving. The Cross-Modal Supervision (CMS) model designed and implemented in this article has been experimentally verified on the public Kaist [7] dataset. The experimental results show that the accuracy of the cross modal supervised model on the Kaist dataset reaches 53.68°/°and the miss rate decreases to 46.32°/°.\",\"PeriodicalId\":334790,\"journal\":{\"name\":\"2023 3rd International Symposium on Computer Technology and Information Science (ISCTIS)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 3rd International Symposium on Computer Technology and Information Science (ISCTIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCTIS58954.2023.10213155\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Symposium on Computer Technology and Information Science (ISCTIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCTIS58954.2023.10213155","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

行人检测与识别是自动驾驶环境意识的重要基础任务之一。现有的一些研究方法是基于视觉图像(RGB)进行检测，但仅使用视觉图像的检测方法无法满足恶劣的检测环境，如多云、多雨、多雾、光照较差的环境，检测效果较差。近年来，基于视觉图像(RGB)和热图像(thermal)的融合检测受到越来越多的关注，但在融合策略上还存在一些问题。RGB热行人检测的目的是将互补的可见光图像和热红外信息融合在一起，提高昼夜环境下行人检测的性能[1]。近年来，许多专家学者对这一问题做了大量的研究工作[2]-[6]，将RGB和热红外图像两种模态数据进行了有效的整合，并取得了一些成果。然而，这些方法直接使用融合特征进行行人检测，没有考虑到融合后的特征质量可能不高的问题。因此，多模态特征融合行人检测结构需要提高融合特征的质量，这对于自动驾驶中多模态数据融合行人检测的研究至关重要。本文设计并实现的跨模态监督(CMS)模型已经在Kaist[7]公共数据集上进行了实验验证。实验结果表明，交叉模态监督模型在Kaist数据集上的准确率达到53.68°/°，脱靶率降至46.32°/°。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multimodal Interactive Supervised Pedestrian Detection Based on yolov5

Pedestrian detection and recognition is one of the important and fundamental tasks of environmental awareness in autonomous driving. Some existing research methods are based on visual images (RGB) for detection, but detection methods that only use visual images cannot meet harsh detection environments, such as in cloudy, rainy, foggy, and poor lighting environments, where the detection effect is poor. In recent years, fusion detection based on visual images (RGB) and thermal images (Thermal) has received increasing attention, but there are still some problems in fusion strategies. The purpose of RGB Thermal pedestrian detection is to fuse complementary visible light images and thermal infrared information to improve the performance of pedestrian detection in day and night environments [1]. In recent years, many experts and scholars have done a lot of research work on this issue [2]–[6], effectively integrating the two modal data of RGB and thermal infrared images, and have achieved some results. However, these methods directly use fused features for pedestrian detection, without considering the problem that the quality of the resulting fused features may not be high. Therefore, multimodal feature fusion pedestrian detection structures need to improve the quality of fusion features, which plays a crucial role in the research of multimodal data fusion pedestrian detection in autonomous driving. The Cross-Modal Supervision (CMS) model designed and implemented in this article has been experimentally verified on the public Kaist [7] dataset. The experimental results show that the accuracy of the cross modal supervised model on the Kaist dataset reaches 53.68°/°and the miss rate decreases to 46.32°/°.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 3rd International Symposium on Computer Technology and Information Science (ISCTIS)

自引率

0.00%

发文量