{"title":"基于注意机制和上下文感知的CotNet目标跟踪算法","authors":"Xinping Pan, Zhen Wang, Xiao Shi, Jianing Li","doi":"10.1117/12.2691214","DOIUrl":null,"url":null,"abstract":"In recent years, Siamese network algorithms based on deep learning classes have achieved better tracking accuracy and speed and become one of the research hotspots in the field of target tracking. However, the traditional Siamese network algorithm lacks a holistic view of the target and extracts shallow features, making it easy to lose track of the target in complex environments. The paper proposes a Contextual transformer network for visual recognition (CotNet) target tracking algorithm based on attentional mechanisms and contextual awareness to address this. The paper innovatively uses the CotNet50 network as the backbone network and adopts a residual network variant design scheme with a self-attention mechanism, which can enhance the feature representation capability of the network model and improve the performance of the algorithm. In addition, to handle changes in appearance during target tracking, an efficient channel attention module, and a global contextual feature module are embedded in the backbone network branch to enhance the network's overall perception of the target and improve the algorithm's tracking accuracy. The experimental results of this paper's algorithm on the VOT2018 data show that the accuracy, robustness, and EAO (Expected Average Overlap) are improved by 7.3%, 13.95%, and 11.9% respectively compared to SiamFC. It has good tracking results when dealing with complex scenes on the OTB100 dataset.","PeriodicalId":114868,"journal":{"name":"International Conference on Optoelectronic Information and Computer Engineering (OICE)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CotNet target tracking algorithm based on attention mechanism and context-awareness\",\"authors\":\"Xinping Pan, Zhen Wang, Xiao Shi, Jianing Li\",\"doi\":\"10.1117/12.2691214\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, Siamese network algorithms based on deep learning classes have achieved better tracking accuracy and speed and become one of the research hotspots in the field of target tracking. However, the traditional Siamese network algorithm lacks a holistic view of the target and extracts shallow features, making it easy to lose track of the target in complex environments. The paper proposes a Contextual transformer network for visual recognition (CotNet) target tracking algorithm based on attentional mechanisms and contextual awareness to address this. The paper innovatively uses the CotNet50 network as the backbone network and adopts a residual network variant design scheme with a self-attention mechanism, which can enhance the feature representation capability of the network model and improve the performance of the algorithm. In addition, to handle changes in appearance during target tracking, an efficient channel attention module, and a global contextual feature module are embedded in the backbone network branch to enhance the network's overall perception of the target and improve the algorithm's tracking accuracy. The experimental results of this paper's algorithm on the VOT2018 data show that the accuracy, robustness, and EAO (Expected Average Overlap) are improved by 7.3%, 13.95%, and 11.9% respectively compared to SiamFC. It has good tracking results when dealing with complex scenes on the OTB100 dataset.\",\"PeriodicalId\":114868,\"journal\":{\"name\":\"International Conference on Optoelectronic Information and Computer Engineering (OICE)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Optoelectronic Information and Computer Engineering (OICE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2691214\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Optoelectronic Information and Computer Engineering (OICE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2691214","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
近年来,基于深度学习类的Siamese网络算法取得了较好的跟踪精度和速度,成为目标跟踪领域的研究热点之一。然而,传统的Siamese网络算法缺乏对目标的整体视图,提取的特征很浅,在复杂环境中容易失去对目标的跟踪。针对这一问题,提出了一种基于注意机制和上下文感知的视觉识别目标跟踪算法。本文创新性地采用CotNet50网络作为骨干网,采用带有自关注机制的残差网络变体设计方案,增强了网络模型的特征表示能力,提高了算法的性能。此外,为了处理目标跟踪过程中的外观变化,在骨干网分支中嵌入了高效的信道关注模块和全局上下文特征模块,增强了网络对目标的整体感知,提高了算法的跟踪精度。本文算法在VOT2018数据上的实验结果表明,与SiamFC相比,准确率、鲁棒性和EAO (Expected Average Overlap)分别提高了7.3%、13.95%和11.9%。在处理OTB100数据集上的复杂场景时,具有良好的跟踪效果。
CotNet target tracking algorithm based on attention mechanism and context-awareness
In recent years, Siamese network algorithms based on deep learning classes have achieved better tracking accuracy and speed and become one of the research hotspots in the field of target tracking. However, the traditional Siamese network algorithm lacks a holistic view of the target and extracts shallow features, making it easy to lose track of the target in complex environments. The paper proposes a Contextual transformer network for visual recognition (CotNet) target tracking algorithm based on attentional mechanisms and contextual awareness to address this. The paper innovatively uses the CotNet50 network as the backbone network and adopts a residual network variant design scheme with a self-attention mechanism, which can enhance the feature representation capability of the network model and improve the performance of the algorithm. In addition, to handle changes in appearance during target tracking, an efficient channel attention module, and a global contextual feature module are embedded in the backbone network branch to enhance the network's overall perception of the target and improve the algorithm's tracking accuracy. The experimental results of this paper's algorithm on the VOT2018 data show that the accuracy, robustness, and EAO (Expected Average Overlap) are improved by 7.3%, 13.95%, and 11.9% respectively compared to SiamFC. It has good tracking results when dealing with complex scenes on the OTB100 dataset.