Multimodal CNN Pedestrian Classification: A Study on Combining LIDAR and Camera Data

Gledson Melotti, C. Premebida, Nuno Gonçalves, U. Nunes, D. Faria
{"title":"Multimodal CNN Pedestrian Classification: A Study on Combining LIDAR and Camera Data","authors":"Gledson Melotti, C. Premebida, Nuno Gonçalves, U. Nunes, D. Faria","doi":"10.1109/ITSC.2018.8569666","DOIUrl":null,"url":null,"abstract":"This paper presents a study on pedestrian classification based on deep learning using data from a monocular camera and a 3D LIDAR sensor, separately and in combination. Early and late multi-modal sensor fusion approaches are revisited and compared in terms of classification performance. The problem of pedestrian classification finds applications in advanced driver assistance system (ADAS) and autonomous driving, and it has regained particular attention recently because, among other reasons, safety involving self-driving vehicles. Convolutional Neural Networks (CNN) is used in this work as classifier in distinct situations: having a single sensor data as input, and by combining data from both sensors in the CNN input layer. Range (distance) and intensity (reflectance) data from LIDAR are considered as separate channels, where data from the LIDAR sensor is feed to the CNN in the form of dense maps, as the result of sensor coordinate transformation and spatial filtering; this allows a direct implementation of the same CNN-based approach on both sensors data. In terms of late-fusion, the outputs from individual CNNs are combined by means of learning and non-learning approaches. Pedestrian classification is evaluated on a ‘binary classification’ dataset created from the KITTI Vision Benchmark Suite, and results are shown for each sensor-modality individually, and for the fusion strategies.","PeriodicalId":395239,"journal":{"name":"2018 21st International Conference on Intelligent Transportation Systems (ITSC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 21st International Conference on Intelligent Transportation Systems (ITSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITSC.2018.8569666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 34

Abstract

This paper presents a study on pedestrian classification based on deep learning using data from a monocular camera and a 3D LIDAR sensor, separately and in combination. Early and late multi-modal sensor fusion approaches are revisited and compared in terms of classification performance. The problem of pedestrian classification finds applications in advanced driver assistance system (ADAS) and autonomous driving, and it has regained particular attention recently because, among other reasons, safety involving self-driving vehicles. Convolutional Neural Networks (CNN) is used in this work as classifier in distinct situations: having a single sensor data as input, and by combining data from both sensors in the CNN input layer. Range (distance) and intensity (reflectance) data from LIDAR are considered as separate channels, where data from the LIDAR sensor is feed to the CNN in the form of dense maps, as the result of sensor coordinate transformation and spatial filtering; this allows a direct implementation of the same CNN-based approach on both sensors data. In terms of late-fusion, the outputs from individual CNNs are combined by means of learning and non-learning approaches. Pedestrian classification is evaluated on a ‘binary classification’ dataset created from the KITTI Vision Benchmark Suite, and results are shown for each sensor-modality individually, and for the fusion strategies.
多模态CNN行人分类:激光雷达与相机数据相结合的研究
本文介绍了一项基于深度学习的行人分类研究,该研究使用了单目相机和3D激光雷达传感器的数据,分别使用和组合使用。对早期和晚期多模态传感器融合方法进行了重新研究,并在分类性能方面进行了比较。行人分类问题在高级驾驶辅助系统(ADAS)和自动驾驶中得到了应用,最近由于涉及自动驾驶车辆的安全性等原因,它重新受到了特别关注。卷积神经网络(CNN)在不同的情况下被用作分类器:将单个传感器数据作为输入,并在CNN输入层中组合来自两个传感器的数据。将来自LIDAR的距离(距离)和强度(反射率)数据视为单独的通道,其中来自LIDAR传感器的数据经过传感器坐标变换和空间滤波,以密集图的形式馈送到CNN;这允许在两个传感器数据上直接实现相同的基于cnn的方法。在后期融合方面,单个cnn的输出通过学习和非学习方法进行组合。行人分类在KITTI视觉基准套件创建的“二元分类”数据集上进行评估,并分别显示每种传感器模式和融合策略的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信