Video Foundation Models for Animal Behavior Analysis

bioRxiv - Animal Behavior and Cognition Pub Date : 2024-07-31 DOI:10.1101/2024.07.30.605655

Jennifer J Sun, Hao Zhou, Long Zhao, Liangzhe Yuan, Bryan Seybold, David Hendon, Florian Schroff, David A Ross, Hartwig Adam, Bo Hu, Ting Liu

{"title":"Video Foundation Models for Animal Behavior Analysis","authors":"Jennifer J Sun, Hao Zhou, Long Zhao, Liangzhe Yuan, Bryan Seybold, David Hendon, Florian Schroff, David A Ross, Hartwig Adam, Bo Hu, Ting Liu","doi":"10.1101/2024.07.30.605655","DOIUrl":null,"url":null,"abstract":"Computational approaches leveraging computer vision and machine learning have transformed the quantification of animal behavior from video. However, existing methods often rely on task-specific features or models, which struggle to generalize across diverse datasets and tasks. Recent advances in machine learning, particularly the emergence of vision foundation models, i.e., large-scale models pre-trained on massive, diverse visual repositories, offers a way to tackle these challenges. Here, we investigate the potential of frozen video foundation models across a range of behavior analysis tasks, including classification, retrieval, and localization. We use a single, frozen model to extract general-purpose representations from video data, and perform extensive evaluations on diverse open-sourced animal behavior datasets. Our results demonstrate that features with minimal adaptation from foundation models achieve competitive performance compared to existing methods specifically designed for each dataset, across species, behaviors, and experimental contexts. This highlights the potential of frozen video foundation models as a powerful and accessible backbone for automated behavior analysis, with the ability to accelerate research across diverse fields from neuroscience, to ethology, and to ecology.","PeriodicalId":501210,"journal":{"name":"bioRxiv - Animal Behavior and Cognition","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Animal Behavior and Cognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.07.30.605655","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Computational approaches leveraging computer vision and machine learning have transformed the quantification of animal behavior from video. However, existing methods often rely on task-specific features or models, which struggle to generalize across diverse datasets and tasks. Recent advances in machine learning, particularly the emergence of vision foundation models, i.e., large-scale models pre-trained on massive, diverse visual repositories, offers a way to tackle these challenges. Here, we investigate the potential of frozen video foundation models across a range of behavior analysis tasks, including classification, retrieval, and localization. We use a single, frozen model to extract general-purpose representations from video data, and perform extensive evaluations on diverse open-sourced animal behavior datasets. Our results demonstrate that features with minimal adaptation from foundation models achieve competitive performance compared to existing methods specifically designed for each dataset, across species, behaviors, and experimental contexts. This highlights the potential of frozen video foundation models as a powerful and accessible backbone for automated behavior analysis, with the ability to accelerate research across diverse fields from neuroscience, to ethology, and to ecology.

查看原文本刊更多论文

动物行为分析视频基础模型

利用计算机视觉和机器学习的计算方法改变了从视频中量化动物行为的方法。然而，现有的方法往往依赖于特定任务的特征或模型，很难在不同的数据集和任务中实现通用化。机器学习领域的最新进展，尤其是视觉基础模型的出现，即在大规模、多样化的视觉资源库中预先训练的大规模模型，为应对这些挑战提供了一种方法。在这里，我们研究了冷冻视频基础模型在一系列行为分析任务中的潜力，包括分类、检索和定位。我们使用单一的冻结模型从视频数据中提取通用表征，并在不同的开源动物行为数据集上进行了广泛的评估。我们的结果表明，与专门为每个数据集设计的现有方法相比，只需对基础模型进行最小化的调整，就能在不同物种、行为和实验环境下获得具有竞争力的性能。这凸显了冷冻视频基础模型的潜力，它是自动行为分析强大而易用的支柱，能够加速从神经科学、伦理学到生态学等不同领域的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

bioRxiv - Animal Behavior and Cognition

自引率

0.00%

发文量