Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization

2017 IEEE International Conference on Computer Vision (ICCV) Pub Date : 2017-10-01 DOI:10.1109/ICCV.2017.63

Sijia Cai, W. Zuo, Lei Zhang

{"title":"Higher-Order Integration of Hierarchical Convolutional Activations for Fine-Grained Visual Categorization","authors":"Sijia Cai, W. Zuo, Lei Zhang","doi":"10.1109/ICCV.2017.63","DOIUrl":null,"url":null,"abstract":"The success of fine-grained visual categorization (FGVC) extremely relies on the modeling of appearance and interactions of various semantic parts. This makes FGVC very challenging because: (i) part annotation and detection require expert guidance and are very expensive; (ii) parts are of different sizes; and (iii) the part interactions are complex and of higher-order. To address these issues, we propose an end-to-end framework based on higherorder integration of hierarchical convolutional activations for FGVC. By treating the convolutional activations as local descriptors, hierarchical convolutional activations can serve as a representation of local parts from different scales. A polynomial kernel based predictor is proposed to capture higher-order statistics of convolutional activations for modeling part interaction. To model inter-layer part interactions, we extend polynomial predictor to integrate hierarchical activations via kernel fusion. Our work also provides a new perspective for combining convolutional activations from multiple layers. While hypercolumns simply concatenate maps from different layers, and holistically-nested network uses weighted fusion to combine side-outputs, our approach exploits higher-order intra-layer and inter-layer relations for better integration of hierarchical convolutional features. The proposed framework yields more discriminative representation and achieves competitive results on the widely used FGVC datasets.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"25 1","pages":"511-520"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"163","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2017.63","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 163

Abstract

The success of fine-grained visual categorization (FGVC) extremely relies on the modeling of appearance and interactions of various semantic parts. This makes FGVC very challenging because: (i) part annotation and detection require expert guidance and are very expensive; (ii) parts are of different sizes; and (iii) the part interactions are complex and of higher-order. To address these issues, we propose an end-to-end framework based on higherorder integration of hierarchical convolutional activations for FGVC. By treating the convolutional activations as local descriptors, hierarchical convolutional activations can serve as a representation of local parts from different scales. A polynomial kernel based predictor is proposed to capture higher-order statistics of convolutional activations for modeling part interaction. To model inter-layer part interactions, we extend polynomial predictor to integrate hierarchical activations via kernel fusion. Our work also provides a new perspective for combining convolutional activations from multiple layers. While hypercolumns simply concatenate maps from different layers, and holistically-nested network uses weighted fusion to combine side-outputs, our approach exploits higher-order intra-layer and inter-layer relations for better integration of hierarchical convolutional features. The proposed framework yields more discriminative representation and achieves competitive results on the widely used FGVC datasets.

查看原文本刊更多论文

用于细粒度视觉分类的层次卷积激活的高阶集成

细粒度视觉分类(FGVC)的成功很大程度上依赖于对各种语义部分的外观和相互作用的建模。这使得FGVC非常具有挑战性，因为:(i)部分注释和检测需要专家指导，并且非常昂贵;(二)零件尺寸不同的;(3)零件相互作用复杂且高阶。为了解决这些问题，我们提出了一个基于FGVC分层卷积激活的高阶集成的端到端框架。通过将卷积激活作为局部描述符，分层卷积激活可以作为来自不同尺度的局部部分的表示。提出了一种基于多项式核的预测器来捕获卷积激活的高阶统计量，用于零件交互建模。为了模拟层间部分的相互作用，我们将多项式预测器扩展到通过核融合集成层次激活。我们的工作也为多层卷积激活的组合提供了一个新的视角。虽然超列简单地连接来自不同层的映射，并且整体嵌套网络使用加权融合来组合侧输出，但我们的方法利用高阶层内和层间关系来更好地集成分层卷积特征。所提出的框架在广泛使用的FGVC数据集上产生了更具歧视性的表示，并获得了竞争性的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Conference on Computer Vision (ICCV)

自引率

0.00%

发文量