结合RGB和LIDAR数据进行行人检测的本地专家多视图随机森林

2015 IEEE Intelligent Vehicles Symposium (IV) Pub Date : 2015-08-27 DOI:10.1109/IVS.2015.7225711

Alejandro González, Gabriel Villalonga, Jiaolong Xu, David Vázquez, J. Amores, Antonio M. López

{"title":"结合RGB和LIDAR数据进行行人检测的本地专家多视图随机森林","authors":"Alejandro González, Gabriel Villalonga, Jiaolong Xu, David Vázquez, J. Amores, Antonio M. López","doi":"10.1109/IVS.2015.7225711","DOIUrl":null,"url":null,"abstract":"Despite recent significant advances, pedestrian detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these conditions, it becomes critical to leverage upon multiple cues, multiple imaging modalities and a strong multi-view classifier that accounts for different pedestrian views and poses. In this paper we provide an extensive evaluation that gives insight into how each of these aspects (multi-cue, multi-modality and strong multi-view classifier) affect performance both individually and when integrated together. In the multi-modality component we explore the fusion of RGB and depth maps obtained by high-definition LIDAR, a type of modality that is only recently starting to receive attention. As our analysis reveals, although all the aforementioned aspects significantly help in improving the performance, the fusion of visible spectrum and depth information allows to boost the accuracy by a much larger margin. The resulting detector not only ranks among the top best performers in the challenging KITTI benchmark, but it is built upon very simple blocks that are easy to implement and computationally efficient. These simple blocks can be easily replaced with more sophisticated ones recently proposed, such as the use of convolutional neural networks for feature representation, to further improve the accuracy.","PeriodicalId":294701,"journal":{"name":"2015 IEEE Intelligent Vehicles Symposium (IV)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"81","resultStr":"{\"title\":\"Multiview random forest of local experts combining RGB and LIDAR data for pedestrian detection\",\"authors\":\"Alejandro González, Gabriel Villalonga, Jiaolong Xu, David Vázquez, J. Amores, Antonio M. López\",\"doi\":\"10.1109/IVS.2015.7225711\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Despite recent significant advances, pedestrian detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these conditions, it becomes critical to leverage upon multiple cues, multiple imaging modalities and a strong multi-view classifier that accounts for different pedestrian views and poses. In this paper we provide an extensive evaluation that gives insight into how each of these aspects (multi-cue, multi-modality and strong multi-view classifier) affect performance both individually and when integrated together. In the multi-modality component we explore the fusion of RGB and depth maps obtained by high-definition LIDAR, a type of modality that is only recently starting to receive attention. As our analysis reveals, although all the aforementioned aspects significantly help in improving the performance, the fusion of visible spectrum and depth information allows to boost the accuracy by a much larger margin. The resulting detector not only ranks among the top best performers in the challenging KITTI benchmark, but it is built upon very simple blocks that are easy to implement and computationally efficient. These simple blocks can be easily replaced with more sophisticated ones recently proposed, such as the use of convolutional neural networks for feature representation, to further improve the accuracy.\",\"PeriodicalId\":294701,\"journal\":{\"name\":\"2015 IEEE Intelligent Vehicles Symposium (IV)\",\"volume\":\"98 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"81\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE Intelligent Vehicles Symposium (IV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IVS.2015.7225711\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Intelligent Vehicles Symposium (IV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IVS.2015.7225711","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 81

摘要

尽管最近取得了重大进展，但在真实场景中，行人检测仍然是一个极具挑战性的问题。为了开发一种能够在这些条件下成功运行的探测器，利用多种线索、多种成像模式和强大的多视图分类器来考虑不同的行人视图和姿势变得至关重要。在本文中，我们提供了一个广泛的评估，深入了解了这些方面(多线索、多模态和强多视图分类器)分别如何影响性能和集成在一起时的性能。在多模态组件中，我们探索了高清晰度激光雷达获得的RGB和深度图的融合，这是一种最近才开始受到关注的模态。正如我们的分析所揭示的，尽管上述所有方面都有助于提高性能，但可见光谱和深度信息的融合可以更大程度地提高精度。由此产生的检测器不仅在具有挑战性的KITTI基准测试中名列前茅，而且它建立在非常简单的块上，易于实现且计算效率高。这些简单的块可以很容易地替换为最近提出的更复杂的块，例如使用卷积神经网络进行特征表示，以进一步提高准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multiview random forest of local experts combining RGB and LIDAR data for pedestrian detection

Despite recent significant advances, pedestrian detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these conditions, it becomes critical to leverage upon multiple cues, multiple imaging modalities and a strong multi-view classifier that accounts for different pedestrian views and poses. In this paper we provide an extensive evaluation that gives insight into how each of these aspects (multi-cue, multi-modality and strong multi-view classifier) affect performance both individually and when integrated together. In the multi-modality component we explore the fusion of RGB and depth maps obtained by high-definition LIDAR, a type of modality that is only recently starting to receive attention. As our analysis reveals, although all the aforementioned aspects significantly help in improving the performance, the fusion of visible spectrum and depth information allows to boost the accuracy by a much larger margin. The resulting detector not only ranks among the top best performers in the challenging KITTI benchmark, but it is built upon very simple blocks that are easy to implement and computationally efficient. These simple blocks can be easily replaced with more sophisticated ones recently proposed, such as the use of convolutional neural networks for feature representation, to further improve the accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE Intelligent Vehicles Symposium (IV)

自引率

0.00%

发文量