AnimalRTPose: Faster cross-species real-time animal pose estimation

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-06-10 DOI:10.1016/j.neunet.2025.107685

Xin Wu , Lianming Wang , Jipeng Huang

{"title":"AnimalRTPose: Faster cross-species real-time animal pose estimation","authors":"Xin Wu , Lianming Wang , Jipeng Huang","doi":"10.1016/j.neunet.2025.107685","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advancements in computer vision have facilitated the development of sophisticated tools for analyzing complex animal behaviors, yet the diversity of animal morphology and environmental complexities present significant challenges to real-time animal pose estimation. To address these challenges, we introduce AnimalRTPose, a one-stage model designed for cross-species real-time animal pose estimation. At its core, AnimalRTPose leverages CSPNeXt<span><math><msup><mrow></mrow><mrow><mi>†</mi></mrow></msup></math></span>, a novel backbone network that integrates depthwise separable convolution with skip connections for high-frequency feature extraction, a channel attention mechanism (CAM) to enhance the fusion of high-frequency and low-frequency features, and spatial pyramid pooling (SPP) to capture multi-scale contextual information. This architecture enables robust feature representation across varying spatial resolutions, enhancing adaptability to diverse species and environments. Additionally, AnimalRTPose incorporates an efficient multi-scale feature fusion module that dynamically balances local detail and global structural consistency, ensuring high accuracy and robustness in pose estimation. Designed for scalability and versatility, AnimalRTPose supports single-animal, multi-animal, cross-species, and few-shot scenarios. Specifically, AnimalRTPose-N achieves 476 FPS on NVIDIA RTX 2080Ti, 769 FPS on NVIDIA RTX 3090, and 1111 FPS on NVIDIA A800, while demonstrating high throughput on edge devices with 196 FPS on the NVIDIA Jetson™ AGX Orin Developer Kit (275 TOPS, 15 W to 60 W), 77 FPS on the Raspberry Pi 5 with AI HAT+ (26 TOPS, 25 W), and 64 FPS on the Atlas 200I Developer Kit A2 (8 TOPS, 24 W), all with a 640 × 640 input resolution. These results surpass all existing one-stage models, showcasing its superior performance in real-time animal pose estimation. AnimalRTPose is thus highly applicable for scenarios requiring real-time animal behavior monitoring. Further details on the model configuration and dataset are available on the <span><span>AnimalRTPose</span><svg><path></path></svg></span> project website.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"190 ","pages":"Article 107685"},"PeriodicalIF":6.3000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025005659","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advancements in computer vision have facilitated the development of sophisticated tools for analyzing complex animal behaviors, yet the diversity of animal morphology and environmental complexities present significant challenges to real-time animal pose estimation. To address these challenges, we introduce AnimalRTPose, a one-stage model designed for cross-species real-time animal pose estimation. At its core, AnimalRTPose leverages CSPNeXt

^{†}

, a novel backbone network that integrates depthwise separable convolution with skip connections for high-frequency feature extraction, a channel attention mechanism (CAM) to enhance the fusion of high-frequency and low-frequency features, and spatial pyramid pooling (SPP) to capture multi-scale contextual information. This architecture enables robust feature representation across varying spatial resolutions, enhancing adaptability to diverse species and environments. Additionally, AnimalRTPose incorporates an efficient multi-scale feature fusion module that dynamically balances local detail and global structural consistency, ensuring high accuracy and robustness in pose estimation. Designed for scalability and versatility, AnimalRTPose supports single-animal, multi-animal, cross-species, and few-shot scenarios. Specifically, AnimalRTPose-N achieves 476 FPS on NVIDIA RTX 2080Ti, 769 FPS on NVIDIA RTX 3090, and 1111 FPS on NVIDIA A800, while demonstrating high throughput on edge devices with 196 FPS on the NVIDIA Jetson™ AGX Orin Developer Kit (275 TOPS, 15 W to 60 W), 77 FPS on the Raspberry Pi 5 with AI HAT+ (26 TOPS, 25 W), and 64 FPS on the Atlas 200I Developer Kit A2 (8 TOPS, 24 W), all with a 640 × 640 input resolution. These results surpass all existing one-stage models, showcasing its superior performance in real-time animal pose estimation. AnimalRTPose is thus highly applicable for scenarios requiring real-time animal behavior monitoring. Further details on the model configuration and dataset are available on the AnimalRTPose project website.

查看原文本刊更多论文

AnimalRTPose：更快的跨物种实时动物姿态估计

计算机视觉的最新进展促进了分析复杂动物行为的复杂工具的发展，但动物形态的多样性和环境的复杂性对实时动物姿态估计提出了重大挑战。为了解决这些挑战，我们引入了AnimalRTPose，这是一个用于跨物种实时动物姿态估计的单阶段模型。在其核心，AnimalRTPose利用CSPNeXt†，一个新的骨干网络，集成了深度可分离卷积和跳跃连接，用于高频特征提取，通道注意机制（CAM），以增强高频和低频特征的融合，以及空间金字塔池（SPP），以捕获多尺度上下文信息。这种结构可以在不同的空间分辨率下实现健壮的特征表示，增强对不同物种和环境的适应性。此外，AnimalRTPose集成了一个高效的多尺度特征融合模块，可以动态平衡局部细节和全局结构一致性，确保姿态估计的高精度和鲁棒性。为可扩展性和多功能性而设计，AnimalRTPose支持单动物、多动物、跨物种和少量射击场景。具体来说,AnimalRTPose-N达到476 FPS NVIDIA RTX 2080 ti, 769 FPS NVIDIA RTX 3090年和1111年在NVIDIA A800 FPS,而边缘设备上展示高吞吐量196 FPS NVIDIA杰森™AGX欧林开发工具包(275、15 W - 60 W), 77 FPS覆盆子π与AI帽子+ 5(26、25 W),和64 FPS的阿特拉斯200年我开发工具包A2(8、24 W),所有输入分辨率640×640。这些结果超越了所有现有的单阶段模型，显示了其在实时动物姿态估计方面的优越性能。因此，AnimalRTPose非常适用于需要实时监测动物行为的场景。关于模型配置和数据集的更多细节可以在AnimalRTPose项目网站上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.