在一个新的世界猴子中使用深度学习来量化面部手势

IF 1.8 3区生物学 Q1 ZOOLOGY

American Journal of Primatology Pub Date : 2025-02-28 DOI:10.1002/ajp.70013

Filippo Carugati, Dayanna Curagi Gorio, Chiara De Gregorio, Daria Valente, Valeria Ferrario, Brice Lefaux, Olivier Friard, Marco Gamba

{"title":"在一个新的世界猴子中使用深度学习来量化面部手势","authors":"Filippo Carugati, Dayanna Curagi Gorio, Chiara De Gregorio, Daria Valente, Valeria Ferrario, Brice Lefaux, Olivier Friard, Marco Gamba","doi":"10.1002/ajp.70013","DOIUrl":null,"url":null,"abstract":"<p>Facial gestures are a crucial component of primate multimodal communication. However, current methodologies for extracting facial data from video recordings are labor-intensive and prone to human subjectivity. Although automatic tools for this task are still in their infancy, deep learning techniques are revolutionizing animal behavior research. This study explores the distinctiveness of facial gestures in cotton-top tamarins, quantified using markerless pose estimation algorithms. From footage of captive individuals, we extracted and manually labeled frames to develop a model that can recognize a custom set of landmarks positioned on the face of the target species. The trained model predicted landmark positions and subsequently transformed them into distance matrices representing landmarks' spatial distributions within each frame. We employed three competitive machine learning classifiers to assess the ability to automatically discriminate facial configurations that cooccur with vocal emissions and are associated with different behavioral contexts. Initial analysis showed correct classification rates exceeding 80%, suggesting that voiced facial configurations are highly distinctive from unvoiced ones. Our findings also demonstrated varying context specificity of facial gestures, with the highest classification accuracy observed during yawning, social activity, and resting. This study highlights the potential of markerless pose estimation for advancing the study of primate multimodal communication, even in challenging species such as cotton-top tamarins. The ability to automatically distinguish facial gestures in different behavioral contexts represents a critical step in developing automated tools for extracting behavioral cues from raw video data.</p>","PeriodicalId":7662,"journal":{"name":"American Journal of Primatology","volume":"87 3","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ajp.70013","citationCount":"0","resultStr":"{\"title\":\"Quantifying Facial Gestures Using Deep Learning in a New World Monkey\",\"authors\":\"Filippo Carugati, Dayanna Curagi Gorio, Chiara De Gregorio, Daria Valente, Valeria Ferrario, Brice Lefaux, Olivier Friard, Marco Gamba\",\"doi\":\"10.1002/ajp.70013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Facial gestures are a crucial component of primate multimodal communication. However, current methodologies for extracting facial data from video recordings are labor-intensive and prone to human subjectivity. Although automatic tools for this task are still in their infancy, deep learning techniques are revolutionizing animal behavior research. This study explores the distinctiveness of facial gestures in cotton-top tamarins, quantified using markerless pose estimation algorithms. From footage of captive individuals, we extracted and manually labeled frames to develop a model that can recognize a custom set of landmarks positioned on the face of the target species. The trained model predicted landmark positions and subsequently transformed them into distance matrices representing landmarks' spatial distributions within each frame. We employed three competitive machine learning classifiers to assess the ability to automatically discriminate facial configurations that cooccur with vocal emissions and are associated with different behavioral contexts. Initial analysis showed correct classification rates exceeding 80%, suggesting that voiced facial configurations are highly distinctive from unvoiced ones. Our findings also demonstrated varying context specificity of facial gestures, with the highest classification accuracy observed during yawning, social activity, and resting. This study highlights the potential of markerless pose estimation for advancing the study of primate multimodal communication, even in challenging species such as cotton-top tamarins. The ability to automatically distinguish facial gestures in different behavioral contexts represents a critical step in developing automated tools for extracting behavioral cues from raw video data.</p>\",\"PeriodicalId\":7662,\"journal\":{\"name\":\"American Journal of Primatology\",\"volume\":\"87 3\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ajp.70013\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Primatology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ajp.70013\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ZOOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Primatology","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ajp.70013","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ZOOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

面部手势是灵长类动物多模式交流的重要组成部分。然而，目前从视频记录中提取面部数据的方法是劳动密集型的，容易受到人为主观性的影响。尽管用于这项任务的自动工具仍处于起步阶段，但深度学习技术正在彻底改变动物行为研究。本研究探讨了棉顶绢毛猴面部手势的独特性，使用无标记姿态估计算法进行量化。从圈养个体的镜头中，我们提取并手动标记帧，以开发一个模型，该模型可以识别定位在目标物种脸上的一组自定义地标。训练后的模型预测地标位置，随后将其转换为表示每帧内地标空间分布的距离矩阵。我们使用了三个竞争性的机器学习分类器来评估自动区分与声音发射同时发生并与不同行为背景相关的面部配置的能力。初步分析表明，正确的分类率超过80%，这表明发声的面部形态与不发声的面部形态高度不同。我们的研究结果还证明了面部手势的不同情境特异性，在打哈欠、社交活动和休息时观察到的分类准确率最高。这项研究强调了无标记姿态估计在推进灵长类动物多模式交流研究方面的潜力，即使是在像棉顶绢毛猴这样具有挑战性的物种中。在不同行为背景下自动区分面部手势的能力是开发从原始视频数据中提取行为线索的自动化工具的关键一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Quantifying Facial Gestures Using Deep Learning in a New World Monkey

查看原文本刊更多论文

Quantifying Facial Gestures Using Deep Learning in a New World Monkey

Facial gestures are a crucial component of primate multimodal communication. However, current methodologies for extracting facial data from video recordings are labor-intensive and prone to human subjectivity. Although automatic tools for this task are still in their infancy, deep learning techniques are revolutionizing animal behavior research. This study explores the distinctiveness of facial gestures in cotton-top tamarins, quantified using markerless pose estimation algorithms. From footage of captive individuals, we extracted and manually labeled frames to develop a model that can recognize a custom set of landmarks positioned on the face of the target species. The trained model predicted landmark positions and subsequently transformed them into distance matrices representing landmarks' spatial distributions within each frame. We employed three competitive machine learning classifiers to assess the ability to automatically discriminate facial configurations that cooccur with vocal emissions and are associated with different behavioral contexts. Initial analysis showed correct classification rates exceeding 80%, suggesting that voiced facial configurations are highly distinctive from unvoiced ones. Our findings also demonstrated varying context specificity of facial gestures, with the highest classification accuracy observed during yawning, social activity, and resting. This study highlights the potential of markerless pose estimation for advancing the study of primate multimodal communication, even in challenging species such as cotton-top tamarins. The ability to automatically distinguish facial gestures in different behavioral contexts represents a critical step in developing automated tools for extracting behavioral cues from raw video data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

American Journal of Primatology 生物-动物学

CiteScore

4.50

自引率

8.30%

发文量

103

审稿时长

4-8 weeks

期刊介绍： The objective of the American Journal of Primatology is to provide a forum for the exchange of ideas and findings among primatologists and to convey our increasing understanding of this order of animals to specialists and interested readers alike. Primatology is an unusual science in that its practitioners work in a wide variety of departments and institutions, live in countries throughout the world, and carry out a vast range of research procedures. Whether we are anthropologists, psychologists, biologists, or medical researchers, whether we live in Japan, Kenya, Brazil, or the United States, whether we conduct naturalistic observations in the field or experiments in the lab, we are united in our goal of better understanding primates. Our studies of nonhuman primates are of interest to scientists in many other disciplines ranging from entomology to sociology.