A Deep Moving-camera Background Model

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision Pub Date : 2022-09-16 DOI:10.48550/arXiv.2209.07923

Guy Erez, R. Weber, O. Freifeld

{"title":"A Deep Moving-camera Background Model","authors":"Guy Erez, R. Weber, O. Freifeld","doi":"10.48550/arXiv.2209.07923","DOIUrl":null,"url":null,"abstract":"In video analysis, background models have many applications such as background/foreground separation, change detection, anomaly detection, tracking, and more. However, while learning such a model in a video captured by a static camera is a fairly-solved task, in the case of a Moving-camera Background Model (MCBM), the success has been far more modest due to algorithmic and scalability challenges that arise due to the camera motion. Thus, existing MCBMs are limited in their scope and their supported camera-motion types. These hurdles also impeded the employment, in this unsupervised task, of end-to-end solutions based on deep learning (DL). Moreover, existing MCBMs usually model the background either on the domain of a typically-large panoramic image or in an online fashion. Unfortunately, the former creates several problems, including poor scalability, while the latter prevents the recognition and leveraging of cases where the camera revisits previously-seen parts of the scene. This paper proposes a new method, called DeepMCBM, that eliminates all the aforementioned issues and achieves state-of-the-art results. Concretely, first we identify the difficulties associated with joint alignment of video frames in general and in a DL setting in particular. Next, we propose a new strategy for joint alignment that lets us use a spatial transformer net with neither a regularization nor any form of specialized (and non-differentiable) initialization. Coupled with an autoencoder conditioned on unwarped robust central moments (obtained from the joint alignment), this yields an end-to-end regularization-free MCBM that supports a broad range of camera motions and scales gracefully. We demonstrate DeepMCBM's utility on a variety of videos, including ones beyond the scope of other methods. Our code is available at https://github.com/BGU-CS-VIL/DeepMCBM .","PeriodicalId":72676,"journal":{"name":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","volume":"36 2","pages":"177-194"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2209.07923","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In video analysis, background models have many applications such as background/foreground separation, change detection, anomaly detection, tracking, and more. However, while learning such a model in a video captured by a static camera is a fairly-solved task, in the case of a Moving-camera Background Model (MCBM), the success has been far more modest due to algorithmic and scalability challenges that arise due to the camera motion. Thus, existing MCBMs are limited in their scope and their supported camera-motion types. These hurdles also impeded the employment, in this unsupervised task, of end-to-end solutions based on deep learning (DL). Moreover, existing MCBMs usually model the background either on the domain of a typically-large panoramic image or in an online fashion. Unfortunately, the former creates several problems, including poor scalability, while the latter prevents the recognition and leveraging of cases where the camera revisits previously-seen parts of the scene. This paper proposes a new method, called DeepMCBM, that eliminates all the aforementioned issues and achieves state-of-the-art results. Concretely, first we identify the difficulties associated with joint alignment of video frames in general and in a DL setting in particular. Next, we propose a new strategy for joint alignment that lets us use a spatial transformer net with neither a regularization nor any form of specialized (and non-differentiable) initialization. Coupled with an autoencoder conditioned on unwarped robust central moments (obtained from the joint alignment), this yields an end-to-end regularization-free MCBM that supports a broad range of camera motions and scales gracefully. We demonstrate DeepMCBM's utility on a variety of videos, including ones beyond the scope of other methods. Our code is available at https://github.com/BGU-CS-VIL/DeepMCBM .

查看原文本刊更多论文

一种深度移动相机背景模型

在视频分析中，背景模型有许多应用，如背景/前景分离、变化检测、异常检测、跟踪等。然而，虽然在静态摄像机捕获的视频中学习这样的模型是一个相当解决的任务，但在移动摄像机背景模型(MCBM)的情况下，由于摄像机运动引起的算法和可扩展性挑战，成功的程度要小得多。因此，现有的mcbm在其范围和支持的相机运动类型方面受到限制。这些障碍也阻碍了基于深度学习(DL)的端到端解决方案在无监督任务中的应用。此外，现有的mcbm通常在典型的大型全景图像的域上或以在线方式对背景进行建模。不幸的是，前者产生了几个问题，包括较差的可扩展性，而后者阻止识别和利用摄像机重新访问以前看到的场景部分的情况。本文提出了一种名为DeepMCBM的新方法，它消除了上述所有问题，并获得了最先进的结果。具体地说，首先我们确定了与视频帧的联合对齐相关的困难，特别是在DL设置中。接下来，我们提出了一种新的联合对齐策略，该策略允许我们使用既没有正则化也没有任何形式的专门(和不可微)初始化的空间变压器网。再加上一个基于无扭曲鲁棒中心矩(从关节对准中获得)的自编码器，这产生了一个端到端无正则化的MCBM，支持广泛的相机运动和优雅的缩放。我们在各种视频上演示了DeepMCBM的实用程序，包括超出其他方法范围的视频。我们的代码可在https://github.com/BGU-CS-VIL/DeepMCBM上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer vision - ECCV ... : ... European Conference on Computer Vision : proceedings. European Conference on Computer Vision

自引率

0.00%

发文量