Learning a fast 3D spectral approach to object segmentation and tracking over space and time

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Pub Date : 2025-01-10 DOI:10.1016/j.artint.2024.104281

Elena Burceanu , Marius Leordeanu

{"title":"Learning a fast 3D spectral approach to object segmentation and tracking over space and time","authors":"Elena Burceanu , Marius Leordeanu","doi":"10.1016/j.artint.2024.104281","DOIUrl":null,"url":null,"abstract":"<div><div>We pose video object segmentation as spectral graph clustering in space and time, with one graph node for each pixel and edges forming local space-time neighborhoods. We claim that the strongest cluster in this video graph represents the salient object. We start by introducing a novel and efficient method based on 3D filtering for approximating the spectral solution, as the principal eigenvector of the graph's adjacency matrix, without explicitly building the matrix. This key property allows us to have a fast parallel implementation on GPU, orders of magnitude faster than classical approaches for computing the eigenvector. Our motivation for a spectral space-time clustering approach, unique in video semantic segmentation literature, is that such clustering is dedicated to preserving object consistency over time, which we evaluate using our novel segmentation consistency measure. Further on, we show how to efficiently learn the solution over multiple input feature channels. Finally, we extend the formulation of our approach beyond the segmentation task, into the realm of object tracking. In extensive experiments we show significant improvements over top methods, as well as over powerful ensembles that combine them, achieving state-of-the-art on multiple benchmarks, both for tracking and segmentation.</div></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"340 ","pages":"Article 104281"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370224002170","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

We pose video object segmentation as spectral graph clustering in space and time, with one graph node for each pixel and edges forming local space-time neighborhoods. We claim that the strongest cluster in this video graph represents the salient object. We start by introducing a novel and efficient method based on 3D filtering for approximating the spectral solution, as the principal eigenvector of the graph's adjacency matrix, without explicitly building the matrix. This key property allows us to have a fast parallel implementation on GPU, orders of magnitude faster than classical approaches for computing the eigenvector. Our motivation for a spectral space-time clustering approach, unique in video semantic segmentation literature, is that such clustering is dedicated to preserving object consistency over time, which we evaluate using our novel segmentation consistency measure. Further on, we show how to efficiently learn the solution over multiple input feature channels. Finally, we extend the formulation of our approach beyond the segmentation task, into the realm of object tracking. In extensive experiments we show significant improvements over top methods, as well as over powerful ensembles that combine them, achieving state-of-the-art on multiple benchmarks, both for tracking and segmentation.

查看原文本刊更多论文

学习一个快速的3D光谱方法，对象分割和跟踪在空间和时间

我们将视频目标分割作为空间和时间上的谱图聚类，每个像素有一个图节点，边缘形成局部时空邻域。我们声称视频图中最强的聚类代表显著对象。首先，我们引入了一种新颖而高效的基于3D滤波的方法来逼近谱解，作为图邻接矩阵的主特征向量，而无需显式构建矩阵。这个关键属性允许我们在GPU上快速并行实现，比计算特征向量的经典方法快几个数量级。我们对光谱时空聚类方法的动机，在视频语义分割文献中是独一无二的，是这种聚类致力于保持对象随时间的一致性，我们使用我们新的分割一致性度量来评估。进一步，我们展示了如何在多个输入特征通道上有效地学习解决方案。最后，我们将我们的方法从分割任务扩展到目标跟踪领域。在广泛的实验中，我们展示了对顶级方法的显着改进，以及将它们结合在一起的强大集成，在多个基准上实现了最先进的跟踪和分割。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

11.20

自引率

1.40%

发文量

118

审稿时长

8 months

期刊介绍： The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.