Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts

2007 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2007-06-17 DOI:10.1109/CVPR.2007.383269

S. Fidler, A. Leonardis

{"title":"Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts","authors":"S. Fidler, A. Leonardis","doi":"10.1109/CVPR.2007.383269","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel approach to constructing a hierarchical representation of visual input that aims to enable recognition and detection of a large number of object categories. Inspired by the principles of efficient indexing (bottom-up,), robust matching (top-down,), and ideas of compositionality, our approach learns a hierarchy of spatially flexible compositions, i.e. parts, in an unsupervised, statistics-driven manner. Starting with simple, frequent features, we learn the statistically most significant compositions (parts composed of parts), which consequently define the next layer. Parts are learned sequentially, layer after layer, optimally adjusting to the visual data. Lower layers are learned in a category-independent way to obtain complex, yet sharable visual building blocks, which is a crucial step towards a scalable representation. Higher layers of the hierarchy, on the other hand, are constructed by using specific categories, achieving a category representation with a small number of highly generalizable parts that gained their structural flexibility through composition within the hierarchy. Built in this way, new categories can be efficiently and continuously added to the system by adding a small number of parts only in the higher layers. The approach is demonstrated on a large collection of images and a variety of object categories. Detection results confirm the effectiveness and robustness of the learned parts.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"217","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE Conference on Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2007.383269","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 217

Abstract

This paper proposes a novel approach to constructing a hierarchical representation of visual input that aims to enable recognition and detection of a large number of object categories. Inspired by the principles of efficient indexing (bottom-up,), robust matching (top-down,), and ideas of compositionality, our approach learns a hierarchy of spatially flexible compositions, i.e. parts, in an unsupervised, statistics-driven manner. Starting with simple, frequent features, we learn the statistically most significant compositions (parts composed of parts), which consequently define the next layer. Parts are learned sequentially, layer after layer, optimally adjusting to the visual data. Lower layers are learned in a category-independent way to obtain complex, yet sharable visual building blocks, which is a crucial step towards a scalable representation. Higher layers of the hierarchy, on the other hand, are constructed by using specific categories, achieving a category representation with a small number of highly generalizable parts that gained their structural flexibility through composition within the hierarchy. Built in this way, new categories can be efficiently and continuously added to the system by adding a small number of parts only in the higher layers. The approach is demonstrated on a large collection of images and a variety of object categories. Detection results confirm the effectiveness and robustness of the learned parts.

查看原文本刊更多论文

面向对象类别的可伸缩表示:学习零件的层次结构

本文提出了一种新的方法来构建视觉输入的分层表示，旨在实现对大量对象类别的识别和检测。受高效索引(自下而上)、健壮匹配(自上而下)和组合性思想的启发，我们的方法以无监督、统计驱动的方式学习空间灵活组合(即部件)的层次结构。从简单、频繁的特征开始，我们学习统计上最重要的组成(由部分组成的部分)，从而定义下一层。部分是按顺序学习的，一层又一层，以最佳方式调整视觉数据。较低的层以一种与类别无关的方式学习，以获得复杂但可共享的视觉构建块，这是迈向可扩展表示的关键一步。另一方面，层次结构的更高层是通过使用特定的类别来构建的，用少量高度一般化的部分实现类别表示，这些部分通过层次结构内的组合获得结构灵活性。通过这种方式构建，只需在较高层中添加少量部件，就可以高效且持续地将新类别添加到系统中。该方法在大量图像和各种对象类别上进行了演示。检测结果证实了学习部分的有效性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2007 IEEE Conference on Computer Vision and Pattern Recognition

自引率

0.00%

发文量