A review on vision-centric coarse to fine-grained animal action recognition

IF 13.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review Pub Date : 2026-03-16 Epub Date: 2026-04-08 DOI:10.1007/s10462-026-11526-5

Ali Zia, Renuka Sharma, Abdelwahed Khamis, Usman Ali, Xuesong Li, Muhammad Husnain, Numan Shafi, Saeed Anwar, Imran Raza, Muhammad Hasan Jamal, Sabine Schmoelzl, Eric Stone, Lars Petersson, Vivien Rolland

{"title":"A review on vision-centric coarse to fine-grained animal action recognition","authors":"Ali Zia, Renuka Sharma, Abdelwahed Khamis, Usman Ali, Xuesong Li, Muhammad Husnain, Numan Shafi, Saeed Anwar, Imran Raza, Muhammad Hasan Jamal, Sabine Schmoelzl, Eric Stone, Lars Petersson, Vivien Rolland","doi":"10.1007/s10462-026-11526-5","DOIUrl":null,"url":null,"abstract":"<div><p>This review provides an in-depth exploration of the field of animal action recognition, focusing on coarse-grained (CG) and fine-grained (FG) techniques. The primary aim is to examine the current state of research in animal behaviour recognition and to elucidate the unique challenges associated with recognising subtle animal actions in outdoor environments. These challenges differ significantly from those encountered in human action recognition due to factors such as non-rigid body structures, frequent occlusions, and the lack of large-scale, annotated datasets. This review underscores the critical differences between human and animal action recognition. While inspired by progress in the human domain, animal action recognition presents unique challenges due to high intra-species variability, complex environmental interactions, and unstructured datasets that human-centric models cannot fully address. Recent multimodal frameworks such as ARTEMIS and MSQNet exemplify state-of-the-art progress by integrating textual cues derived from video with visual and audio modalities. When considered alongside established spatio-temporal architectures like SlowFast, these developments signal a shift toward richer multimodal paradigms in behaviour analysis. By assessing the strengths and weaknesses of current methodologies and introducing a recently published dataset, the review outlines future directions for advancing fine-grained action recognition, aiming to improve accuracy and generalisability in behaviour analysis across species. This review extends beyond earlier reviews by offering the first systematic treatment of coarse-grained (CG) and fine-grained (FG) action recognition in animals.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"59 5","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2026-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-026-11526-5.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-026-11526-5","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/4/8 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

This review provides an in-depth exploration of the field of animal action recognition, focusing on coarse-grained (CG) and fine-grained (FG) techniques. The primary aim is to examine the current state of research in animal behaviour recognition and to elucidate the unique challenges associated with recognising subtle animal actions in outdoor environments. These challenges differ significantly from those encountered in human action recognition due to factors such as non-rigid body structures, frequent occlusions, and the lack of large-scale, annotated datasets. This review underscores the critical differences between human and animal action recognition. While inspired by progress in the human domain, animal action recognition presents unique challenges due to high intra-species variability, complex environmental interactions, and unstructured datasets that human-centric models cannot fully address. Recent multimodal frameworks such as ARTEMIS and MSQNet exemplify state-of-the-art progress by integrating textual cues derived from video with visual and audio modalities. When considered alongside established spatio-temporal architectures like SlowFast, these developments signal a shift toward richer multimodal paradigms in behaviour analysis. By assessing the strengths and weaknesses of current methodologies and introducing a recently published dataset, the review outlines future directions for advancing fine-grained action recognition, aiming to improve accuracy and generalisability in behaviour analysis across species. This review extends beyond earlier reviews by offering the first systematic treatment of coarse-grained (CG) and fine-grained (FG) action recognition in animals.

查看原文本刊更多论文

以视觉为中心的粗粒度到细粒度动物动作识别研究进展

本文综述了动物动作识别领域的深入探索，重点是粗粒度（CG）和细粒度（FG）技术。主要目的是检查动物行为识别的研究现状，并阐明在室外环境中识别微妙动物行为的独特挑战。由于非刚体结构、频繁遮挡以及缺乏大规模、带注释的数据集等因素，这些挑战与人类动作识别中遇到的挑战有很大不同。这篇综述强调了人类和动物动作识别之间的关键差异。虽然受到人类领域进展的启发，但由于物种内的高变异性，复杂的环境相互作用以及以人类为中心的模型无法完全解决的非结构化数据集，动物动作识别提出了独特的挑战。最近的多模式框架，如ARTEMIS和MSQNet，通过将来自视频的文本线索与视觉和音频模式相结合，体现了最先进的进展。当与SlowFast等已建立的时空架构一起考虑时，这些发展标志着行为分析向更丰富的多模态范式的转变。通过评估当前方法的优缺点，并引入最近发布的数据集，该综述概述了推进细粒度动作识别的未来方向，旨在提高跨物种行为分析的准确性和通用性。本综述通过提供动物粗粒度（CG）和细粒度（FG）动作识别的首次系统处理，扩展了先前的综述。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.