{"title":"Intelligent Vehicle Environment Scene Parsing Method Based on Multi-tasking Convolutional Neural Network*","authors":"J. Lian, Yuhang Yin, Jiahao Pi, Yuekai Yang","doi":"10.1109/CVCI51460.2020.9338621","DOIUrl":null,"url":null,"abstract":"An encoder-decoder convolutional neural network architecture is presented integrating multi-class semantic segmentation and multi-object detection to improve the efficiency and depth of scene parsing of intelligent vehicle. The encoder of the network is designed as a multi-scale structure to improve real-time performance while ensuring the accuracy. The decoders of the network comprise the semantic segmentation and object detection subnetworks, which share encoder feature maps to improve computational efficiency. During the training process, we use FPS (Frames Per Second) and MIoU (Mean Intersection over Union) as the evaluation metrics of semantic segmentation, while the mAP (mean Average Precision) and FPS are used as the performance evaluation indexes of object detection. We conduct separate and joint training on the network to evaluate its performance. Experimental results show that the proposed network can realize multi-class semantic segmentation and multi-object detection simultaneously with better real-time performance and richer feature information, making it highly possible for implementation on real vehicles.","PeriodicalId":119721,"journal":{"name":"2020 4th CAA International Conference on Vehicular Control and Intelligence (CVCI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 4th CAA International Conference on Vehicular Control and Intelligence (CVCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVCI51460.2020.9338621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
An encoder-decoder convolutional neural network architecture is presented integrating multi-class semantic segmentation and multi-object detection to improve the efficiency and depth of scene parsing of intelligent vehicle. The encoder of the network is designed as a multi-scale structure to improve real-time performance while ensuring the accuracy. The decoders of the network comprise the semantic segmentation and object detection subnetworks, which share encoder feature maps to improve computational efficiency. During the training process, we use FPS (Frames Per Second) and MIoU (Mean Intersection over Union) as the evaluation metrics of semantic segmentation, while the mAP (mean Average Precision) and FPS are used as the performance evaluation indexes of object detection. We conduct separate and joint training on the network to evaluate its performance. Experimental results show that the proposed network can realize multi-class semantic segmentation and multi-object detection simultaneously with better real-time performance and richer feature information, making it highly possible for implementation on real vehicles.
为了提高智能汽车场景分析的效率和深度,提出了一种集多类语义分割和多目标检测于一体的编码器-解码器卷积神经网络架构。网络的编码器采用多尺度结构设计,在保证精度的同时提高了实时性。该网络的解码器包括语义分割和目标检测子网,它们共享编码器特征映射以提高计算效率。在训练过程中,我们使用FPS (Frames Per Second)和MIoU (Mean Intersection over Union)作为语义分割的评价指标,mAP (Mean Average Precision)和FPS作为目标检测的性能评价指标。我们对网络进行单独和联合训练,以评估其性能。实验结果表明,该网络能够同时实现多类语义分割和多目标检测,具有更好的实时性和更丰富的特征信息,为在真实车辆上实现提供了很大的可能性。