细粒度表示学习的自适应双线性池

Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI:10.1145/3338533.3366567

Shaobo Min, Hongtao Xie, Youliang Tian, Hantao Yao, Yongdong Zhang

{"title":"细粒度表示学习的自适应双线性池","authors":"Shaobo Min, Hongtao Xie, Youliang Tian, Hantao Yao, Yongdong Zhang","doi":"10.1145/3338533.3366567","DOIUrl":null,"url":null,"abstract":"Fine-grained representation learning targets to generate discriminative description for fine-grained visual objects. Recently, the bilinear feature interaction has been proved effective in generating powerful high-order representation with spatially invariant information. However, the existing methods apply a fixed feature interaction strategy to all samples, which ignore the image and region heterogeneity in a dataset. To this end, we propose a generalized feature interaction method, named Adaptive Bilinear Pooling (ABP), which can adaptively infer a suitable pooling strategy for a given sample based on image content. Specifically, ABP consists of two learning strategies: p-order learning (P-net) and spatial attention learning (S-net). The p-order learning predicts an optimal exponential coefficient rather than a fixed order number to extract moderate visual information from an image. The spatial attention learning aims to infer a weighted score that measures the importance of each local region, which can compact the image representations. To make ABP compatible with kernelized bilinear feature interaction, a crossed two-branch structure is utilized to combine the P-net and S-net. This structure can facilitate complementary information exchange between two different visual branches. The experiments on three widely used benchmarks, including fine-grained object classification and action recognition, demonstrate the effectiveness of the proposed method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Adaptive Bilinear Pooling for Fine-grained Representation Learning\",\"authors\":\"Shaobo Min, Hongtao Xie, Youliang Tian, Hantao Yao, Yongdong Zhang\",\"doi\":\"10.1145/3338533.3366567\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fine-grained representation learning targets to generate discriminative description for fine-grained visual objects. Recently, the bilinear feature interaction has been proved effective in generating powerful high-order representation with spatially invariant information. However, the existing methods apply a fixed feature interaction strategy to all samples, which ignore the image and region heterogeneity in a dataset. To this end, we propose a generalized feature interaction method, named Adaptive Bilinear Pooling (ABP), which can adaptively infer a suitable pooling strategy for a given sample based on image content. Specifically, ABP consists of two learning strategies: p-order learning (P-net) and spatial attention learning (S-net). The p-order learning predicts an optimal exponential coefficient rather than a fixed order number to extract moderate visual information from an image. The spatial attention learning aims to infer a weighted score that measures the importance of each local region, which can compact the image representations. To make ABP compatible with kernelized bilinear feature interaction, a crossed two-branch structure is utilized to combine the P-net and S-net. This structure can facilitate complementary information exchange between two different visual branches. The experiments on three widely used benchmarks, including fine-grained object classification and action recognition, demonstrate the effectiveness of the proposed method.\",\"PeriodicalId\":273086,\"journal\":{\"name\":\"Proceedings of the ACM Multimedia Asia\",\"volume\":\"99 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Multimedia Asia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3338533.3366567\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3338533.3366567","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

细粒度表示学习的目标是为细粒度的视觉对象生成判别描述。近年来，双线性特征相互作用被证明可以有效地生成具有空间不变信息的高阶表示。然而，现有方法对所有样本采用固定特征交互策略，忽略了数据集中图像和区域的异质性。为此，我们提出了一种广义的特征交互方法，称为自适应双线性池(ABP)，它可以根据图像内容自适应地推断出给定样本的合适池化策略。具体来说，ABP包括两种学习策略:p阶学习(P-net)和空间注意学习(S-net)。p阶学习预测一个最优的指数系数，而不是一个固定的阶数，从图像中提取适度的视觉信息。空间注意学习的目的是推断一个加权分数，衡量每个局部区域的重要性，从而压缩图像表征。为了使ABP兼容核化双线性特征交互，采用交叉双分支结构将p -网和s -网结合起来。这种结构可以促进两个不同视觉分支之间的互补信息交换。在细粒度目标分类和动作识别这三个广泛使用的基准上进行了实验，验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive Bilinear Pooling for Fine-grained Representation Learning

Fine-grained representation learning targets to generate discriminative description for fine-grained visual objects. Recently, the bilinear feature interaction has been proved effective in generating powerful high-order representation with spatially invariant information. However, the existing methods apply a fixed feature interaction strategy to all samples, which ignore the image and region heterogeneity in a dataset. To this end, we propose a generalized feature interaction method, named Adaptive Bilinear Pooling (ABP), which can adaptively infer a suitable pooling strategy for a given sample based on image content. Specifically, ABP consists of two learning strategies: p-order learning (P-net) and spatial attention learning (S-net). The p-order learning predicts an optimal exponential coefficient rather than a fixed order number to extract moderate visual information from an image. The spatial attention learning aims to infer a weighted score that measures the importance of each local region, which can compact the image representations. To make ABP compatible with kernelized bilinear feature interaction, a crossed two-branch structure is utilized to combine the P-net and S-net. This structure can facilitate complementary information exchange between two different visual branches. The experiments on three widely used benchmarks, including fine-grained object classification and action recognition, demonstrate the effectiveness of the proposed method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ACM Multimedia Asia

自引率

0.00%

发文量