{"title":"用于自适应多尺度特征表示的选择性深度注意网络","authors":"Qingbei Guo;Xiao-Jun Wu;Tianyang Xu;Tongzhen Si;Cong Hu;Jinglan Tian","doi":"10.1109/TAI.2024.3401652","DOIUrl":null,"url":null,"abstract":"Existing multiscale methods lead to a risk of just increasing the receptive field sizes while neglecting small receptive fields. Thus, it is a challenging problem to effectively construct adaptive neural networks for recognizing various spatial-scale objects. To tackle this issue, we first introduce a new attention dimension, i.e., depth, in addition to existing attentions such as channel-attention, spatial-attention, branch-attention, and self-attention. We present a novel selective depth attention network to treat multiscale objects symmetrically in various vision tasks. Specifically, the blocks within each stage of neural networks, including convolutional neural networks (CNNs), e.g., ResNet, SENet, and Res2Net, and vision transformers (ViTs), e.g., PVTv2, output the hierarchical feature maps with the same resolution but different receptive field sizes. Based on this structural property, we design a depthwise building module, namely an selective depth attention (SDA) module, including a trunk branch and a SE-like attention branch. The block outputs of the trunk branch are fused to guide their depth attention allocation through the attention branch globally. According to the proposed attention mechanism, we dynamically select different depth features, which contributes to adaptively adjusting the receptive field sizes for the variable-sized input objects. Moreover, our method is orthogonal to multiscale networks and attention networks, so-called SDA-\n<inline-formula><tex-math>$x$</tex-math></inline-formula>\nNet. Extensive experiments demonstrate that the proposed SDA method significantly improves the original performance as a lightweight and efficient plug-in on numerous computer vision tasks, e.g., image classification, object detection, and instance segmentation.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Selective Depth Attention Networks for Adaptive Multiscale Feature Representation\",\"authors\":\"Qingbei Guo;Xiao-Jun Wu;Tianyang Xu;Tongzhen Si;Cong Hu;Jinglan Tian\",\"doi\":\"10.1109/TAI.2024.3401652\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Existing multiscale methods lead to a risk of just increasing the receptive field sizes while neglecting small receptive fields. Thus, it is a challenging problem to effectively construct adaptive neural networks for recognizing various spatial-scale objects. To tackle this issue, we first introduce a new attention dimension, i.e., depth, in addition to existing attentions such as channel-attention, spatial-attention, branch-attention, and self-attention. We present a novel selective depth attention network to treat multiscale objects symmetrically in various vision tasks. Specifically, the blocks within each stage of neural networks, including convolutional neural networks (CNNs), e.g., ResNet, SENet, and Res2Net, and vision transformers (ViTs), e.g., PVTv2, output the hierarchical feature maps with the same resolution but different receptive field sizes. Based on this structural property, we design a depthwise building module, namely an selective depth attention (SDA) module, including a trunk branch and a SE-like attention branch. The block outputs of the trunk branch are fused to guide their depth attention allocation through the attention branch globally. According to the proposed attention mechanism, we dynamically select different depth features, which contributes to adaptively adjusting the receptive field sizes for the variable-sized input objects. Moreover, our method is orthogonal to multiscale networks and attention networks, so-called SDA-\\n<inline-formula><tex-math>$x$</tex-math></inline-formula>\\nNet. Extensive experiments demonstrate that the proposed SDA method significantly improves the original performance as a lightweight and efficient plug-in on numerous computer vision tasks, e.g., image classification, object detection, and instance segmentation.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10531158/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10531158/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Selective Depth Attention Networks for Adaptive Multiscale Feature Representation
Existing multiscale methods lead to a risk of just increasing the receptive field sizes while neglecting small receptive fields. Thus, it is a challenging problem to effectively construct adaptive neural networks for recognizing various spatial-scale objects. To tackle this issue, we first introduce a new attention dimension, i.e., depth, in addition to existing attentions such as channel-attention, spatial-attention, branch-attention, and self-attention. We present a novel selective depth attention network to treat multiscale objects symmetrically in various vision tasks. Specifically, the blocks within each stage of neural networks, including convolutional neural networks (CNNs), e.g., ResNet, SENet, and Res2Net, and vision transformers (ViTs), e.g., PVTv2, output the hierarchical feature maps with the same resolution but different receptive field sizes. Based on this structural property, we design a depthwise building module, namely an selective depth attention (SDA) module, including a trunk branch and a SE-like attention branch. The block outputs of the trunk branch are fused to guide their depth attention allocation through the attention branch globally. According to the proposed attention mechanism, we dynamically select different depth features, which contributes to adaptively adjusting the receptive field sizes for the variable-sized input objects. Moreover, our method is orthogonal to multiscale networks and attention networks, so-called SDA-
$x$
Net. Extensive experiments demonstrate that the proposed SDA method significantly improves the original performance as a lightweight and efficient plug-in on numerous computer vision tasks, e.g., image classification, object detection, and instance segmentation.