识别野生大熊猫行为的基于时间和空间注意力的联合变换器方法

IF 5.8 2区 环境科学与生态学 Q1 ECOLOGY
Jing Liu , Jin Hou , Dan Liu , Qijun Zhao , Rui Chen , Xiaoyuan Chen , Vanessa Hull , Jindong Zhang , Jifeng Ning
{"title":"识别野生大熊猫行为的基于时间和空间注意力的联合变换器方法","authors":"Jing Liu ,&nbsp;Jin Hou ,&nbsp;Dan Liu ,&nbsp;Qijun Zhao ,&nbsp;Rui Chen ,&nbsp;Xiaoyuan Chen ,&nbsp;Vanessa Hull ,&nbsp;Jindong Zhang ,&nbsp;Jifeng Ning","doi":"10.1016/j.ecoinf.2024.102797","DOIUrl":null,"url":null,"abstract":"<div><p>Wild giant pandas, an endangered species exclusive to China, are a focus of conservation efforts. The behavior of giant pandas reflects their health conditions and activity capabilities, which play an important role in formulating and implementing conservation measures. Researching and developing efficient behavior recognition methods based on deep learning can significantly advance the study of wild giant panda behavior. This study introduces, for the first time, a transformer-based behavior recognition method termed PandaFormer, which employs time-spatial attention to analyze behavioral temporal patterns and estimate activity spaces. The method integrates advanced techniques such as cross-fusion recurrent time encoding and transformer modules, which handle both temporal dynamics and spatial relationships within panda behavior videos. First, we design cross-fusion recurrent time encoding to represent the occurrence time of behaviors effectively. By leveraging the multimodal processing capability of the transformer, we input time and video tokens into the transformer module to explore the relation between behavior and occurrence time. Second, we introduce relative temporal weights between video frames to enable the model to learn sequential relationships. Finally, considering the fixed position of the camera during recording, we propose a spatial attention mechanism based on the estimation of the panda's activity area. To validate the effectiveness of the model, a video dataset of wild giant pandas, encompassing five typical behaviors, was constructed. The proposed method is evaluated on this video-level annotated dataset. It achieves a Top-1 accuracy of 92.25 % and a mean class precision of 91.19 %, surpassing state-of-the-art behavior recognition algorithms by a large margin. Furthermore, the ablation experiments validate the effectiveness of the proposed temporal and spatial attention mechanisms. In conclusion, the proposed method offers an effective way of studying panda behavior and holds potential for application to other wildlife species.</p></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":null,"pages":null},"PeriodicalIF":5.8000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S157495412400339X/pdfft?md5=789f7bb46c25667b7b6903e3a1edf5d4&pid=1-s2.0-S157495412400339X-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A joint time and spatial attention-based transformer approach for recognizing the behaviors of wild giant pandas\",\"authors\":\"Jing Liu ,&nbsp;Jin Hou ,&nbsp;Dan Liu ,&nbsp;Qijun Zhao ,&nbsp;Rui Chen ,&nbsp;Xiaoyuan Chen ,&nbsp;Vanessa Hull ,&nbsp;Jindong Zhang ,&nbsp;Jifeng Ning\",\"doi\":\"10.1016/j.ecoinf.2024.102797\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Wild giant pandas, an endangered species exclusive to China, are a focus of conservation efforts. The behavior of giant pandas reflects their health conditions and activity capabilities, which play an important role in formulating and implementing conservation measures. Researching and developing efficient behavior recognition methods based on deep learning can significantly advance the study of wild giant panda behavior. This study introduces, for the first time, a transformer-based behavior recognition method termed PandaFormer, which employs time-spatial attention to analyze behavioral temporal patterns and estimate activity spaces. The method integrates advanced techniques such as cross-fusion recurrent time encoding and transformer modules, which handle both temporal dynamics and spatial relationships within panda behavior videos. First, we design cross-fusion recurrent time encoding to represent the occurrence time of behaviors effectively. By leveraging the multimodal processing capability of the transformer, we input time and video tokens into the transformer module to explore the relation between behavior and occurrence time. Second, we introduce relative temporal weights between video frames to enable the model to learn sequential relationships. Finally, considering the fixed position of the camera during recording, we propose a spatial attention mechanism based on the estimation of the panda's activity area. To validate the effectiveness of the model, a video dataset of wild giant pandas, encompassing five typical behaviors, was constructed. The proposed method is evaluated on this video-level annotated dataset. It achieves a Top-1 accuracy of 92.25 % and a mean class precision of 91.19 %, surpassing state-of-the-art behavior recognition algorithms by a large margin. Furthermore, the ablation experiments validate the effectiveness of the proposed temporal and spatial attention mechanisms. In conclusion, the proposed method offers an effective way of studying panda behavior and holds potential for application to other wildlife species.</p></div>\",\"PeriodicalId\":51024,\"journal\":{\"name\":\"Ecological Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2024-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S157495412400339X/pdfft?md5=789f7bb46c25667b7b6903e3a1edf5d4&pid=1-s2.0-S157495412400339X-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ecological Informatics\",\"FirstCategoryId\":\"93\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S157495412400339X\",\"RegionNum\":2,\"RegionCategory\":\"环境科学与生态学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ecological Informatics","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S157495412400339X","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

野生大熊猫是中国特有的濒危物种,是保护工作的重点。大熊猫的行为反映了其健康状况和活动能力,对制定和实施保护措施具有重要作用。研究和开发基于深度学习的高效行为识别方法,可以大大推进野生大熊猫行为研究。本研究首次提出了一种基于变换器的行为识别方法--PandaFormer,该方法利用时间-空间注意力来分析行为的时间模式并估计活动空间。该方法集成了交叉融合递归时间编码和变换器模块等先进技术,可同时处理熊猫行为视频中的时间动态和空间关系。首先,我们设计了交叉融合递归时间编码,以有效表示行为的发生时间。利用变换器的多模态处理能力,我们将时间和视频标记输入变换器模块,以探索行为与发生时间之间的关系。其次,我们在视频帧之间引入相对时间权重,使模型能够学习顺序关系。最后,考虑到摄像机在拍摄过程中的固定位置,我们提出了一种基于熊猫活动区域估计的空间注意机制。为了验证模型的有效性,我们构建了一个野生大熊猫视频数据集,其中包含五种典型行为。在这个视频级注释数据集上对所提出的方法进行了评估。它的 Top-1 准确率达到 92.25%,平均类精度达到 91.19%,大大超过了最先进的行为识别算法。此外,消融实验验证了所提出的时间和空间注意力机制的有效性。总之,所提出的方法为研究大熊猫的行为提供了一种有效的途径,并有望应用于其他野生动物物种。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A joint time and spatial attention-based transformer approach for recognizing the behaviors of wild giant pandas

Wild giant pandas, an endangered species exclusive to China, are a focus of conservation efforts. The behavior of giant pandas reflects their health conditions and activity capabilities, which play an important role in formulating and implementing conservation measures. Researching and developing efficient behavior recognition methods based on deep learning can significantly advance the study of wild giant panda behavior. This study introduces, for the first time, a transformer-based behavior recognition method termed PandaFormer, which employs time-spatial attention to analyze behavioral temporal patterns and estimate activity spaces. The method integrates advanced techniques such as cross-fusion recurrent time encoding and transformer modules, which handle both temporal dynamics and spatial relationships within panda behavior videos. First, we design cross-fusion recurrent time encoding to represent the occurrence time of behaviors effectively. By leveraging the multimodal processing capability of the transformer, we input time and video tokens into the transformer module to explore the relation between behavior and occurrence time. Second, we introduce relative temporal weights between video frames to enable the model to learn sequential relationships. Finally, considering the fixed position of the camera during recording, we propose a spatial attention mechanism based on the estimation of the panda's activity area. To validate the effectiveness of the model, a video dataset of wild giant pandas, encompassing five typical behaviors, was constructed. The proposed method is evaluated on this video-level annotated dataset. It achieves a Top-1 accuracy of 92.25 % and a mean class precision of 91.19 %, surpassing state-of-the-art behavior recognition algorithms by a large margin. Furthermore, the ablation experiments validate the effectiveness of the proposed temporal and spatial attention mechanisms. In conclusion, the proposed method offers an effective way of studying panda behavior and holds potential for application to other wildlife species.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Ecological Informatics
Ecological Informatics 环境科学-生态学
CiteScore
8.30
自引率
11.80%
发文量
346
审稿时长
46 days
期刊介绍: The journal Ecological Informatics is devoted to the publication of high quality, peer-reviewed articles on all aspects of computational ecology, data science and biogeography. The scope of the journal takes into account the data-intensive nature of ecology, the growing capacity of information technology to access, harness and leverage complex data as well as the critical need for informing sustainable management in view of global environmental and climate change. The nature of the journal is interdisciplinary at the crossover between ecology and informatics. It focuses on novel concepts and techniques for image- and genome-based monitoring and interpretation, sensor- and multimedia-based data acquisition, internet-based data archiving and sharing, data assimilation, modelling and prediction of ecological data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信