{"title":"Adaptive Bilinear Pooling for Fine-grained Representation Learning","authors":"Shaobo Min, Hongtao Xie, Youliang Tian, Hantao Yao, Yongdong Zhang","doi":"10.1145/3338533.3366567","DOIUrl":null,"url":null,"abstract":"Fine-grained representation learning targets to generate discriminative description for fine-grained visual objects. Recently, the bilinear feature interaction has been proved effective in generating powerful high-order representation with spatially invariant information. However, the existing methods apply a fixed feature interaction strategy to all samples, which ignore the image and region heterogeneity in a dataset. To this end, we propose a generalized feature interaction method, named Adaptive Bilinear Pooling (ABP), which can adaptively infer a suitable pooling strategy for a given sample based on image content. Specifically, ABP consists of two learning strategies: p-order learning (P-net) and spatial attention learning (S-net). The p-order learning predicts an optimal exponential coefficient rather than a fixed order number to extract moderate visual information from an image. The spatial attention learning aims to infer a weighted score that measures the importance of each local region, which can compact the image representations. To make ABP compatible with kernelized bilinear feature interaction, a crossed two-branch structure is utilized to combine the P-net and S-net. This structure can facilitate complementary information exchange between two different visual branches. The experiments on three widely used benchmarks, including fine-grained object classification and action recognition, demonstrate the effectiveness of the proposed method.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3338533.3366567","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Fine-grained representation learning targets to generate discriminative description for fine-grained visual objects. Recently, the bilinear feature interaction has been proved effective in generating powerful high-order representation with spatially invariant information. However, the existing methods apply a fixed feature interaction strategy to all samples, which ignore the image and region heterogeneity in a dataset. To this end, we propose a generalized feature interaction method, named Adaptive Bilinear Pooling (ABP), which can adaptively infer a suitable pooling strategy for a given sample based on image content. Specifically, ABP consists of two learning strategies: p-order learning (P-net) and spatial attention learning (S-net). The p-order learning predicts an optimal exponential coefficient rather than a fixed order number to extract moderate visual information from an image. The spatial attention learning aims to infer a weighted score that measures the importance of each local region, which can compact the image representations. To make ABP compatible with kernelized bilinear feature interaction, a crossed two-branch structure is utilized to combine the P-net and S-net. This structure can facilitate complementary information exchange between two different visual branches. The experiments on three widely used benchmarks, including fine-grained object classification and action recognition, demonstrate the effectiveness of the proposed method.