{"title":"RGB-T跟踪的跨模态注意网络","authors":"Yang Yang, Hong Liang, Yue Yang, Tao Feng","doi":"10.1109/CTISC52352.2021.00068","DOIUrl":null,"url":null,"abstract":"RGB-T tracking has attracted more and more attention due to its excellent performance. However, how to make full use of the complementary advantages of visible light images and thermal infrared images in RGB-T tracking without losing this advantage in deep feature learning is still a challenge. This paper proposes a Cross-modal Attention Network, which is corrected by triple attention after each feature information is extracted to obtain richer modal feature information. Then a parallel and layer-by-layer interactive network is used to realize the feature complementarity between the two modalities and ensure that the complementary advantages are not lost in deep learning. A large number of experiments on two RGB-T benchmark datasets verify the effectiveness of this algorithm.","PeriodicalId":268378,"journal":{"name":"2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Cross-modal Attention Network for RGB-T Tracking\",\"authors\":\"Yang Yang, Hong Liang, Yue Yang, Tao Feng\",\"doi\":\"10.1109/CTISC52352.2021.00068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RGB-T tracking has attracted more and more attention due to its excellent performance. However, how to make full use of the complementary advantages of visible light images and thermal infrared images in RGB-T tracking without losing this advantage in deep feature learning is still a challenge. This paper proposes a Cross-modal Attention Network, which is corrected by triple attention after each feature information is extracted to obtain richer modal feature information. Then a parallel and layer-by-layer interactive network is used to realize the feature complementarity between the two modalities and ensure that the complementary advantages are not lost in deep learning. A large number of experiments on two RGB-T benchmark datasets verify the effectiveness of this algorithm.\",\"PeriodicalId\":268378,\"journal\":{\"name\":\"2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CTISC52352.2021.00068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CTISC52352.2021.00068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RGB-T tracking has attracted more and more attention due to its excellent performance. However, how to make full use of the complementary advantages of visible light images and thermal infrared images in RGB-T tracking without losing this advantage in deep feature learning is still a challenge. This paper proposes a Cross-modal Attention Network, which is corrected by triple attention after each feature information is extracted to obtain richer modal feature information. Then a parallel and layer-by-layer interactive network is used to realize the feature complementarity between the two modalities and ensure that the complementary advantages are not lost in deep learning. A large number of experiments on two RGB-T benchmark datasets verify the effectiveness of this algorithm.