2022 19th Conference on Robots and Vision (CRV)最新文献

Occlusion-Aware Self-Supervised Stereo Matching with Confidence Guided Raw Disparity Fusion 基于置信度引导的原始视差融合的闭塞感知自监督立体匹配

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00025

Xiule Fan, Soo Jeon, B. Fidan

{"title":"Occlusion-Aware Self-Supervised Stereo Matching with Confidence Guided Raw Disparity Fusion","authors":"Xiule Fan, Soo Jeon, B. Fidan","doi":"10.1109/CRV55824.2022.00025","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00025","url":null,"abstract":"Commercially available stereo cameras used in robots and other intelligent systems to obtain depth information typically rely on traditional stereo matching algorithms. Although their raw (predicted) disparity maps contain incorrect estimates, these algorithms can still provide useful prior information towards more accurate prediction. We propose a pipeline to incorporate this prior information to produce more accurate disparity maps. The proposed pipeline includes a confidence generation component to identify raw disparity inaccuracies as well as a self-supervised deep neural network (DNN) to predict disparity and compute the corresponding occlusion masks. The proposed DNN consists of a feature extraction module, a confidence guided raw disparity fusion module to generate an initial disparity map, and a hierarchical occlusion-aware disparity refinement module to compute the final estimates. Experimental results on public datasets verify that the proposed pipeline has competitive accuracy with real-time processing rate. We also test the pipeline with images captured by commercial stereo cameras to show its effectiveness in improving their raw disparity estimates.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116524836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Classification of handwritten annotations in mixed-media documents 混合媒体文档中手写注释的分类

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00027

Amanda Dash, A. Albu

{"title":"Classification of handwritten annotations in mixed-media documents","authors":"Amanda Dash, A. Albu","doi":"10.1109/CRV55824.2022.00027","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00027","url":null,"abstract":"Handwritten annotations in documents contain valuable information, but they are challenging to detect and identify. This paper addresses this challenge. We propose an al-gorithm for generating a novel mixed-media document dataset, Annotated Docset, that consists of 14 classes of machine-printed and handwritten elements and annotations. We also propose a novel loss function, Dense Loss, which can correctly identify small objects in complex documents when used in fully convolutional networks (e.g. U-NET, DeepLabV3+). Our Dense Loss function is a compound function that uses local region homogeneity to promote contiguous and smooth segmentation predictions while also using an L1-norm loss to reconstruct the dense-labelled ground truth. By using regression instead of a probabilistic approach to pixel classification, we avoid the pitfalls of training on datasets with small or underrepre-sented objects. We show that our loss function outperforms other semantic segmentation loss functions for imbalanced datasets, containing few elements that occupy small areas. Experimental results show that the proposed method achieved a mean Intersection-over-Union (mIoU) score of 0.7163 for all document classes and 0.6290 for handwritten annotations, thus outperforming state-of-the-art loss functions.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128668276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Occluded Text Detection and Recognition in the Wild 野外遮挡文本检测与识别

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00026

Z. Raisi, J. Zelek

{"title":"Occluded Text Detection and Recognition in the Wild","authors":"Z. Raisi, J. Zelek","doi":"10.1109/CRV55824.2022.00026","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00026","url":null,"abstract":"The performance of existing deep-learning scene text recognition-based methods fails significantly on occluded text instances or even partially occluded characters in a text due to their reliance on the visibility of the target characters in images. This failure is often due to features generated by the current architectures with limited robustness to occlusion, which opens the possibility of improving the feature extractors and/or the learning models to better handle these severe occlusions. In this paper, we first evaluate the performance of the current scene text detection, scene text recognition, and scene text spotting models using two publicly-available occlusion datasets: Occlusion Scene Text (OST) that is designed explicitly for scene text recognition, and we also prepare an Occluded Character-level using the Total-Text (OCTT) dataset for evaluating the scene text spotting and detection models. Then we utilize a very recent Transformer-based framework in deep learning, namely Masked Auto Encoder (MAE), as a backbone for scene text detection and recognition pipelines to mitigate the occlusion problem. The performance of our scene text recognition and end-to-end scene text spotting models improves by transfer learning on the pre-trained MAE backbone. For example, our recognition model witnessed a 4% word recognition accuracy on the OST dataset. Our end-to-end text spotting model achieved 68.5% F-measure performance outperforming the stat-of-the-art methods when equipped with an MAE backbone compared to a convolutional neural network (CNN) backbone on the OCTT dataset.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115002103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Proceedings 2022 19th Conference on Robots and Vision 2022年第19届机器人与视觉会议论文集

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/crv55824.2022.00001

引用次数: 0

Instance Segmentation of Herring and Salmon Schools in Acoustic Echograms using a Hybrid U-Net 基于混合U-Net的声回波图中鲱鱼和鲑鱼种群的实例分割

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00010

Alex L. Slonimer, Melissa Cote, T. Marques, A. Rezvanifar, S. Dosso, A. Albu, Kaan Ersahin, T. Mudge, S. Gauthier

引用次数: 1

Program Committee: CRV 2022 项目委员会:CRV 2022

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/crv55824.2022.00007

引用次数: 0

The Lasso Method for Multi-Robot Foraging 多机器人觅食的套索法

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00022

A. Vardy

引用次数: 2

An Exact Fast Fourier Method for Morphological Dilation and Erosion Using the Umbra Technique 基于本影技术的形态扩张和侵蚀的精确快速傅立叶方法

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00032

V. Sridhar, M. Breuß

引用次数: 1

Semi-supervised Grounding Alignment for Multi-modal Feature Learning 多模态特征学习的半监督接地对齐

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00015

Shih-Han Chou, Zicong Fan, J. Little, L. Sigal

{"title":"Semi-supervised Grounding Alignment for Multi-modal Feature Learning","authors":"Shih-Han Chou, Zicong Fan, J. Little, L. Sigal","doi":"10.1109/CRV55824.2022.00015","DOIUrl":"https://doi.org/10.1109/CRV55824.2022.00015","url":null,"abstract":"Self-supervised transformer-based architectures, such as ViLBERT [1] and others, have recently emerged as dominant paradigms for multi-modal feature learning. Such architectures leverage large-scale datasets (e.g., Conceptual Captions [2]) and, typically, image-sentence pairings, for self-supervision. However, conventional multi-modal feature learning requires huge datasets and computing for both pre-training and fine-tuning to the target task. In this paper, we illustrate that more granular semi-supervised alignment at a region-phrase level is an additional useful cue and can further improve the performance of such representations. To this end, we propose a novel semi-supervised grounding alignment loss, which leverages an off-the-shelf pre-trained phrase grounding model for pseudo-supervision (by producing region-phrase alignments). This semi-supervised formulation enables better feature learning in the absence of any additional human annotations on the large-scale (Conceptual Captions) dataset. Further, it shows an even larger margin of improvement on smaller data splits, leading to effective data-efficient feature learning. We illustrate the superiority of the learned features by fine-tuning the resulting models to multiple vision-language downstream tasks: visual question answering (VQA), visual commonsense reasoning (VCR), and visual grounding. Experiments on the VQA, VCR, and grounding benchmarks demonstrate the improvement of up to 1.3% in accuracy (in visual grounding) with large-scale training; up to 5.9% (in VQA) with 1/8 of the data for pre-training and fine-tuning11We will release the code and all pre-trained models upon acceptance..","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116919633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Safe Landing Zones Detection for UAVs Using Deep Regression 基于深度回归的无人机安全着陆区检测

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI: 10.1109/CRV55824.2022.00035

Sakineh Abdollahzadeh, Pier-Luc Proulx, M. S. Allili, J. Lapointe

引用次数: 1