Alexandru Toma, Hao-Ya Hsueh, H. Jaafar, Riku Murai, P. Kelly, Sajad Saeedi
{"title":"PathBench: A Benchmarking Platform for Classical and Learned Path Planning Algorithms","authors":"Alexandru Toma, Hao-Ya Hsueh, H. Jaafar, Riku Murai, P. Kelly, Sajad Saeedi","doi":"10.1109/CRV52889.2021.00019","DOIUrl":"https://doi.org/10.1109/CRV52889.2021.00019","url":null,"abstract":"Path planning is a key component in mobile robotics. A wide range of path planning algorithms exist, but few attempts have been made to benchmark the algorithms holistically or unify their interface. Moreover, with the recent advances in deep neural networks, there is an urgent need to facilitate the development and benchmarking of such learning-based planning algorithms. This paper presents PathBench, a platform for developing, visualizing, training, testing, and benchmarking of existing and future, classical and learned 2D and 3D path planning algorithms, while offering support for Robot Operating System (ROS). Many existing path planning algorithms are supported; e.g. A*, wavefront, rapidly-exploring random tree, value iteration networks, gated path planning networks; and integrating new algorithms is easy and clearly specified. We demonstrate the benchmarking capability of PathBench by comparing implemented classical and learned algorithms for metrics, such as path length, success rate, computational time and path deviation. These evaluations are done on built-in PathBench maps and external path planning environments from video games and real world databases. PathBench is open source 1.","PeriodicalId":413697,"journal":{"name":"2021 18th Conference on Robots and Vision (CRV)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129165089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"To Keystone or Not to Keystone, that is the Correction","authors":"K. Dick, J. Tanner, J. Green","doi":"10.1109/CRV52889.2021.00027","DOIUrl":"https://doi.org/10.1109/CRV52889.2021.00027","url":null,"abstract":"\"To Keystone or not to Keystone, that is the correction\"... and indeed the question! Outside of highly constrained conditions, the vast majority of photographed imagery of the natural environment is taken non-square to the objects that they represent Consequently, those objects appearing at a distorted perspective may be computationally corrected via Keystone Correction. This disparity is frequently observed when considering imagery sourced from vehicle-mounted cameras, such as those levied in autonomous vehicle infrastructure or by streetscape collection initiatives such as Google Street View. As visual creatures, the lived environment proximal to roadways is filled with text- and numeric-based advertisements vying for our attention and, conveniently, this signage isn’t placed perpendicular to a vehicle’s forward-facing camera. Given the perspective distortion of the text and/or values contained therein, their automated detection and reading may benefit from Keystone correction. In this work, we address the yet-unanswered question: what benefit might we expect from Keystone correction preprocessing of images? We do not explicitly promote the use of Keystone correction but rather, evaluate its utility within a prediction pipeline. To this end, we leverage the Gas Prices of America (GPA) dataset containing multi-digit, multi-price values and the French Street Sign Names (FSNS) multi-word text dataset given their known geometry enabling the automation of image Keystone correction. We compare the outcomes of $color{Magenta}{text{Keystoned}}$ imagery versus $color{Blue}{text{non - Keystoned}}$ imagery along five axes: 1) predictive performance, 2) annotation correctness, 3) algorithmic computational complexity and empirical time estimation, 4) image scaling, and 5) degree of perspective transform. From our findings, we arrive at several recommendations on both the benefit & burden of Keystone correction to inform future research on extracting information in the wild.","PeriodicalId":413697,"journal":{"name":"2021 18th Conference on Robots and Vision (CRV)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129886426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robotic Object Manipulation with Full-Trajectory GAN-Based Imitation Learning","authors":"Haoxu Wang, D. Meger","doi":"10.1109/CRV52889.2021.00016","DOIUrl":"https://doi.org/10.1109/CRV52889.2021.00016","url":null,"abstract":"This paper develops a novel generative imitation learning system capable of capturing the distribution of expert demonstrations in trajectory space, which allows longer temporal context within complex motion sequences to be captured. While auto-regressive models that model time-steps sequentially can in principle be recursively applied to capture long sequences, there are known issues with learning such models reliably. In contrast, our model represents full trajectories a first-class entities, which has required us to adapt the typical generative adversarial learning architecture. We pair a full-trajectory discriminator with an imitation-inspired generative trajectory model and train these two in adversarial fashion. Our results show that our method matches the performance of existing approaches for simple tasks, in simulation and on real robot deployments. We produce state-of-the-art accuracy in replicating motions that contain long-term dependencies such as pouring.","PeriodicalId":413697,"journal":{"name":"2021 18th Conference on Robots and Vision (CRV)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127642304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mukul Khanna, Tanu Sharma, Ayyappa Swamy Thatavarthy, K. Krishna
{"title":"Building Facades to Normal Maps: Adversarial Learning from Single View Images","authors":"Mukul Khanna, Tanu Sharma, Ayyappa Swamy Thatavarthy, K. Krishna","doi":"10.1109/CRV52889.2021.00009","DOIUrl":"https://doi.org/10.1109/CRV52889.2021.00009","url":null,"abstract":"Surface normal estimation is an essential component of several computer and robot vision pipelines. While this problem has been extensively studied, most approaches are geared towards indoor scenes and often rely on multiple modalities (depth, multiple views) for accurate estimation of normal maps. Outdoor scenes pose a greater challenge as they exhibit significant lighting variation, often contain occluders, and structures like building facades are often ridden with numerous windows and protrusions. Conventional supervised learning schemes excel in indoor scenes, but do not exhibit competitive performance when trained and deployed in outdoor environments. Furthermore, they involve complex network architectures and require many more trainable parameters. To tackle these challenges, we present an adversarial learning scheme that regularizes the output normal maps from a neural network to appear more realistic, by using a small number of precisely annotated examples. Our method presents a lightweight and simpler architecture, while improving performance by at least 1.5x across most metrics. We evaluate our approaches against the state-of-the-art on normal map estimation, on a synthetic and a real outdoor dataset, and observe significant performance enhancements.","PeriodicalId":413697,"journal":{"name":"2021 18th Conference on Robots and Vision (CRV)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114936861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uncertainty-Aware Policy Sampling and Mixing for Safe Interactive Imitation Learning","authors":"Manfred Diaz, T. Fevens, L. Paull","doi":"10.1109/CRV52889.2021.00018","DOIUrl":"https://doi.org/10.1109/CRV52889.2021.00018","url":null,"abstract":"Teaching robots how to execute tasks through demonstrations is appealing since it sidesteps the need to explicitly specify a reward function. However, posing imitation learning as a simple supervised learning problem suffers from the well-known problem of distributional shift - the teacher will only demonstrate the optimal trajectory and therefore the learner is unable to recover if it deviates even slightly from this trajectory since it has no training data for this case. This problem has been overcome in the literature by some element of interactivity in the learning process - usually be somehow interleaving the execution of the learner and the teacher so that the teacher can demonstrate to the learner also how to recover from mistakes. In this paper, we consider the cases where the robot has the potential to do harm, and therefore safety must be imposed at every step in the learning process. We show that uncertainty is an appropriate measure of safety and that both the mixing of the policies and the data sampling procedure benefit from considering the uncertainty of both the learner and the teacher. Our method, uncertainty-aware policy sampling and mixing (UPMS), is used to teach an agent to drive down a lane with less safety violations and less queries to the teacher than state-of-the-art methods.","PeriodicalId":413697,"journal":{"name":"2021 18th Conference on Robots and Vision (CRV)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121110480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandru Toma, H. Jaafar, Hao-Ya Hsueh, Stephen James, Daniel Lenton, R. Clark, Sajad Saeedi
{"title":"Waypoint Planning Networks","authors":"Alexandru Toma, H. Jaafar, Hao-Ya Hsueh, Stephen James, Daniel Lenton, R. Clark, Sajad Saeedi","doi":"10.1109/CRV52889.2021.00020","DOIUrl":"https://doi.org/10.1109/CRV52889.2021.00020","url":null,"abstract":"With the recent advances in machine learning, path planning algorithms are also evolving; however, the learned path planning algorithms often have difficulty competing with success rates of classic algorithms. We propose waypoint planning networks (WPN), a hybrid algorithm based on LSTMs with a local kernel—a classic algorithm such as A*, and a global kernel using a learned algorithm. WPN produces a more computationally efficient and robust solution. We compare WPN against A*, as well as related works including motion planning networks (MPNet) and value iteration networks (VIN). In this paper, the design and experiments have been conducted for 2D environments. Experimental results outline the benefits of WPN, both in efficiency and generalization. It is shown that WPN’s search space is considerably less than A*, while being able to generate near optimal results. Additionally, WPN works on partial maps, unlike A* which needs the full map in advance. The code is available online 1.","PeriodicalId":413697,"journal":{"name":"2021 18th Conference on Robots and Vision (CRV)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123825832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-Resolution and Multi-Domain Analysis of Off-Road Datasets for Autonomous Driving","authors":"Orighomisan Mayuku, B. Surgenor, J. Marshall","doi":"10.1109/CRV52889.2021.00030","DOIUrl":"https://doi.org/10.1109/CRV52889.2021.00030","url":null,"abstract":"For use in off-road autonomous driving applications, we propose and study the use of multi-resolution local binary pattern texture descriptors to improve overall semantic segmentation performance and reduce class imbalance effects in off-road visual datasets. Our experiments, using a challenging publicly available off-road dataset as well as our own off-road dataset, show that texture features provide added flexibility towards reducing class imbalance effects, and that fusing color and texture features can improve segmentation performance. Finally, we demonstrate domain adaptation limitations in nominally similar off-road environments by cross-comparing the segmentation performance of convolutional neural networks trained on both datasets.","PeriodicalId":413697,"journal":{"name":"2021 18th Conference on Robots and Vision (CRV)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124910720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Calibration of the Offset Between GPS and Semantic Map Frames for Robust Localization","authors":"Wei-Kang Tseng, Angela P. Schoellig, T. Barfoot","doi":"10.1109/CRV52889.2021.00031","DOIUrl":"https://doi.org/10.1109/CRV52889.2021.00031","url":null,"abstract":"In self-driving, standalone GPS is generally considered to have insufficient positioning accuracy to stay in lane. Instead, many turn to LIDAR localization, but this comes at the expense of building LIDAR maps that can be costly to maintain. Another possibility is to use semantic cues such as lane lines and traffic lights to achieve localization, but these are usually not continuously visible. This issue can be remedied by combining semantic cues with GPS to fill in the gaps. However, due to elapsed time between mapping and localization, the live GPS frame can be offset from the semantic map frame, requiring calibration. In this paper, we propose a robust semantic localization algorithm that self-calibrates for the offset between the live GPS and semantic map frames by exploiting common semantic cues, including traffic lights and lane markings. We formulate the problem using a modified Iterated Extended Kalman Filter, which incorporates GPS and camera images for semantic cue detection via Convolutional Neural Networks. Experimental results show that our proposed algorithm achieves decimetre-level accuracy comparable to typical LIDAR localization performance and is robust against sparse semantic features and frequent GPS dropouts.","PeriodicalId":413697,"journal":{"name":"2021 18th Conference on Robots and Vision (CRV)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125503779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sequential Fusion via Bounding Box and Motion PointPainting for 3D Objection Detection","authors":"Anas Mahmoud, Steven L. Waslander","doi":"10.1109/CRV52889.2021.00013","DOIUrl":"https://doi.org/10.1109/CRV52889.2021.00013","url":null,"abstract":"Due to the complementary characteristics of camera and LiDAR data, recent research efforts have been focused on designing 3D object detectors capable of fusing images and point clouds. However, LiDAR-based detectors currently achieve better performance on KITTI and Waymo benchmark datasets [1], [2] when compared to fusion methods. This result is counter-intuitive, as fusing information from the two modalities should result in performance that at least matches the performance of LiDAR-only methods. Pointpainting [3] attempts to address this gap by sequential fusion, which solves the issue of misalignment between image view and LiDAR BEV. In this paper, we propose class-aware and class-agnostic point painting methods which employ predicted bounding boxes from image-based 2D object detectors to extract coarse image semantics instead of full scene semantic segmentation used in [3]. In addition, a motion point painting method is proposed to fuse motion cues as a way to focus attention on dynamic objects when they can be reliably distinguished from the scene, as is the case when the sensors are static. Our experiments on KITTI [1] show a 3% mAP improvement on car class for bounding box methods compared to PointPainting [3]. In addition, motion painting shows an improvement of 1.45% mAP for car class and 2.99% for pedestrian class on our proprietary traffic dataset. Finally, we conduct a range-binned evaluation on KITTI dataset using two different LiDAR stream and show that relative gain of sequential fusion methods is dependent on the selected LiDAR stream.","PeriodicalId":413697,"journal":{"name":"2021 18th Conference on Robots and Vision (CRV)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125118806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maxime Vaidis, P. Giguère, F. Pomerleau, V. Kubelka
{"title":"Accurate outdoor ground truth based on total stations","authors":"Maxime Vaidis, P. Giguère, F. Pomerleau, V. Kubelka","doi":"10.1109/CRV52889.2021.00012","DOIUrl":"https://doi.org/10.1109/CRV52889.2021.00012","url":null,"abstract":"In robotics, accurate ground-truth position fostered the development of mapping and localization algorithms through the creation of cornerstone datasets. In outdoor environments and over long distances, total stations are the most accurate and precise measurement instruments for this purpose. Most total station-based systems in the literature are limited to three Degrees Of Freedoms (DOFs), due to the use of a single-prism tracking approach. In this paper, we present preliminary work on measuring a full pose of a vehicle, bringing the referencing system to six DOFs. Three total stations are used to track in real time three prisms attached to a target platform. We describe the structure of the referencing system and the protocol for acquiring the ground truth with this system. We evaluated its precision in a variety of different outdoor environments, ranging from open-sky to forest trails, and compare this system with another popular source of reference position, the Real Time Kinematics (RTK) positioning solution. Results show that our approach is the most precise, reaching an average positional error of 10 mm and 0.6 deg. This difference in performance was particularly stark in environments where Global Navigation Satellite System (GNSS) signals can be weaker due to overreaching vegetation.","PeriodicalId":413697,"journal":{"name":"2021 18th Conference on Robots and Vision (CRV)","volume":"195 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132675325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}