AnimalRTPose: Faster cross-species real-time animal pose estimation

IF 6.3 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xin Wu , Lianming Wang , Jipeng Huang
{"title":"AnimalRTPose: Faster cross-species real-time animal pose estimation","authors":"Xin Wu ,&nbsp;Lianming Wang ,&nbsp;Jipeng Huang","doi":"10.1016/j.neunet.2025.107685","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advancements in computer vision have facilitated the development of sophisticated tools for analyzing complex animal behaviors, yet the diversity of animal morphology and environmental complexities present significant challenges to real-time animal pose estimation. To address these challenges, we introduce AnimalRTPose, a one-stage model designed for cross-species real-time animal pose estimation. At its core, AnimalRTPose leverages CSPNeXt<span><math><msup><mrow></mrow><mrow><mi>†</mi></mrow></msup></math></span>, a novel backbone network that integrates depthwise separable convolution with skip connections for high-frequency feature extraction, a channel attention mechanism (CAM) to enhance the fusion of high-frequency and low-frequency features, and spatial pyramid pooling (SPP) to capture multi-scale contextual information. This architecture enables robust feature representation across varying spatial resolutions, enhancing adaptability to diverse species and environments. Additionally, AnimalRTPose incorporates an efficient multi-scale feature fusion module that dynamically balances local detail and global structural consistency, ensuring high accuracy and robustness in pose estimation. Designed for scalability and versatility, AnimalRTPose supports single-animal, multi-animal, cross-species, and few-shot scenarios. Specifically, AnimalRTPose-N achieves 476 FPS on NVIDIA RTX 2080Ti, 769 FPS on NVIDIA RTX 3090, and 1111 FPS on NVIDIA A800, while demonstrating high throughput on edge devices with 196 FPS on the NVIDIA Jetson™ AGX Orin Developer Kit (275 TOPS, 15 W to 60 W), 77 FPS on the Raspberry Pi 5 with AI HAT+ (26 TOPS, 25 W), and 64 FPS on the Atlas 200I Developer Kit A2 (8 TOPS, 24 W), all with a 640 × 640 input resolution. These results surpass all existing one-stage models, showcasing its superior performance in real-time animal pose estimation. AnimalRTPose is thus highly applicable for scenarios requiring real-time animal behavior monitoring. Further details on the model configuration and dataset are available on the <span><span>AnimalRTPose</span><svg><path></path></svg></span> project website.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"190 ","pages":"Article 107685"},"PeriodicalIF":6.3000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025005659","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advancements in computer vision have facilitated the development of sophisticated tools for analyzing complex animal behaviors, yet the diversity of animal morphology and environmental complexities present significant challenges to real-time animal pose estimation. To address these challenges, we introduce AnimalRTPose, a one-stage model designed for cross-species real-time animal pose estimation. At its core, AnimalRTPose leverages CSPNeXt, a novel backbone network that integrates depthwise separable convolution with skip connections for high-frequency feature extraction, a channel attention mechanism (CAM) to enhance the fusion of high-frequency and low-frequency features, and spatial pyramid pooling (SPP) to capture multi-scale contextual information. This architecture enables robust feature representation across varying spatial resolutions, enhancing adaptability to diverse species and environments. Additionally, AnimalRTPose incorporates an efficient multi-scale feature fusion module that dynamically balances local detail and global structural consistency, ensuring high accuracy and robustness in pose estimation. Designed for scalability and versatility, AnimalRTPose supports single-animal, multi-animal, cross-species, and few-shot scenarios. Specifically, AnimalRTPose-N achieves 476 FPS on NVIDIA RTX 2080Ti, 769 FPS on NVIDIA RTX 3090, and 1111 FPS on NVIDIA A800, while demonstrating high throughput on edge devices with 196 FPS on the NVIDIA Jetson™ AGX Orin Developer Kit (275 TOPS, 15 W to 60 W), 77 FPS on the Raspberry Pi 5 with AI HAT+ (26 TOPS, 25 W), and 64 FPS on the Atlas 200I Developer Kit A2 (8 TOPS, 24 W), all with a 640 × 640 input resolution. These results surpass all existing one-stage models, showcasing its superior performance in real-time animal pose estimation. AnimalRTPose is thus highly applicable for scenarios requiring real-time animal behavior monitoring. Further details on the model configuration and dataset are available on the AnimalRTPose project website.
AnimalRTPose:更快的跨物种实时动物姿态估计
计算机视觉的最新进展促进了分析复杂动物行为的复杂工具的发展,但动物形态的多样性和环境的复杂性对实时动物姿态估计提出了重大挑战。为了解决这些挑战,我们引入了AnimalRTPose,这是一个用于跨物种实时动物姿态估计的单阶段模型。在其核心,AnimalRTPose利用CSPNeXt†,一个新的骨干网络,集成了深度可分离卷积和跳跃连接,用于高频特征提取,通道注意机制(CAM),以增强高频和低频特征的融合,以及空间金字塔池(SPP),以捕获多尺度上下文信息。这种结构可以在不同的空间分辨率下实现健壮的特征表示,增强对不同物种和环境的适应性。此外,AnimalRTPose集成了一个高效的多尺度特征融合模块,可以动态平衡局部细节和全局结构一致性,确保姿态估计的高精度和鲁棒性。为可扩展性和多功能性而设计,AnimalRTPose支持单动物、多动物、跨物种和少量射击场景。具体来说,AnimalRTPose-N达到476 FPS NVIDIA RTX 2080 ti, 769 FPS NVIDIA RTX 3090年和1111年在NVIDIA A800 FPS,而边缘设备上展示高吞吐量196 FPS NVIDIA杰森™AGX欧林开发工具包(275、15 W - 60 W), 77 FPS覆盆子π与AI帽子+ 5(26、25 W),和64 FPS的阿特拉斯200年我开发工具包A2(8、24 W),所有输入分辨率640×640。这些结果超越了所有现有的单阶段模型,显示了其在实时动物姿态估计方面的优越性能。因此,AnimalRTPose非常适用于需要实时监测动物行为的场景。关于模型配置和数据集的更多细节可以在AnimalRTPose项目网站上找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neural Networks
Neural Networks 工程技术-计算机:人工智能
CiteScore
13.90
自引率
7.70%
发文量
425
审稿时长
67 days
期刊介绍: Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信