How to treat mixed behavior segments in supervised machine learning of behavioural modes from inertial measurement data.

IF 3.4 1区生物学 Q2 ECOLOGY

Movement Ecology Pub Date : 2024-06-10 DOI:10.1186/s40462-024-00485-7

Yehezkel S Resheff, Hanna M Bensch, Markus Zöttl, Roi Harel, Akiko Matsumoto-Oda, Margaret C Crofoot, Sara Gomez, Luca Börger, Shay Rotics

{"title":"How to treat mixed behavior segments in supervised machine learning of behavioural modes from inertial measurement data.","authors":"Yehezkel S Resheff, Hanna M Bensch, Markus Zöttl, Roi Harel, Akiko Matsumoto-Oda, Margaret C Crofoot, Sara Gomez, Luca Börger, Shay Rotics","doi":"10.1186/s40462-024-00485-7","DOIUrl":null,"url":null,"abstract":"<p><p>The application of supervised machine learning methods to identify behavioural modes from inertial measurements of bio-loggers has become a standard tool in behavioural ecology. Several design choices can affect the accuracy of identifying the behavioural modes. One such choice is the inclusion or exclusion of segments consisting of more than a single behaviour (mixed segments) in the machine learning model training data. Currently, the common practice is to ignore such segments during model training. In this paper we tested the hypothesis that including mixed segments in model training will improve accuracy, as the model would perform better in identifying them in the test data. We test this hypothesis using a series of data simulations on four datasets of accelerometer data coupled with behaviour observations, obtained from four study species (Damaraland mole-rats, meerkats, olive baboons, polar bears). Results show that when a substantial proportion of the test data are mixed behaviour segments (above ~ 10%), including mixed segments in machine learning model training improves the accuracy of classification. These results were consistent across the four study species, and robust to changes in segment length, sample size, and degree of mixture within the mixed segments. However, we also find that in some cases (particularly in baboons) models trained with mixed segments show reduced accuracy in classifying test data containing only single behaviour (pure) segments, compared to models trained without mixed segments. Based on these results, we recommend that when the classification model is expected to deal with a substantial proportion of mixed behaviour segments (> 10%), it is beneficial to include them in model training, otherwise, it is unnecessary but also not harmful. The exception is when there is a basis to assume that the training data contains a higher rate of mixed segments than the actual (unobserved) data to be classified-such a situation may occur particularly when training data are collected in captivity and used to classify data from the wild. In this case, excess inclusion of mixed segments in training data should probably be avoided.</p>","PeriodicalId":54288,"journal":{"name":"Movement Ecology","volume":"12 1","pages":"44"},"PeriodicalIF":3.4000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11165886/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Movement Ecology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s40462-024-00485-7","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ECOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

The application of supervised machine learning methods to identify behavioural modes from inertial measurements of bio-loggers has become a standard tool in behavioural ecology. Several design choices can affect the accuracy of identifying the behavioural modes. One such choice is the inclusion or exclusion of segments consisting of more than a single behaviour (mixed segments) in the machine learning model training data. Currently, the common practice is to ignore such segments during model training. In this paper we tested the hypothesis that including mixed segments in model training will improve accuracy, as the model would perform better in identifying them in the test data. We test this hypothesis using a series of data simulations on four datasets of accelerometer data coupled with behaviour observations, obtained from four study species (Damaraland mole-rats, meerkats, olive baboons, polar bears). Results show that when a substantial proportion of the test data are mixed behaviour segments (above ~ 10%), including mixed segments in machine learning model training improves the accuracy of classification. These results were consistent across the four study species, and robust to changes in segment length, sample size, and degree of mixture within the mixed segments. However, we also find that in some cases (particularly in baboons) models trained with mixed segments show reduced accuracy in classifying test data containing only single behaviour (pure) segments, compared to models trained without mixed segments. Based on these results, we recommend that when the classification model is expected to deal with a substantial proportion of mixed behaviour segments (> 10%), it is beneficial to include them in model training, otherwise, it is unnecessary but also not harmful. The exception is when there is a basis to assume that the training data contains a higher rate of mixed segments than the actual (unobserved) data to be classified-such a situation may occur particularly when training data are collected in captivity and used to classify data from the wild. In this case, excess inclusion of mixed segments in training data should probably be avoided.

查看原文本刊更多论文

从惯性测量数据中对行为模式进行监督式机器学习时，如何处理混合行为片段。

应用监督机器学习方法从生物记录仪的惯性测量中识别行为模式已成为行为生态学的标准工具。有几种设计选择会影响识别行为模式的准确性。其中一种选择是在机器学习模型训练数据中包含或排除由不止一种行为组成的片段（混合片段）。目前，常见的做法是在模型训练过程中忽略此类片段。在本文中，我们测试了一种假设，即在模型训练中加入混合片段会提高准确性，因为模型在测试数据中识别混合片段的能力会更强。我们在四个加速度计数据集上进行了一系列数据模拟，并结合了行为观察结果，从四个研究物种（达马拉兰鼹鼠、狐獴、橄榄狒狒和北极熊）中获得的数据对这一假设进行了验证。结果表明，当混合行为片段在测试数据中占很大比例（约 10%）时，在机器学习模型训练中加入混合片段可提高分类的准确性。这些结果在四个研究物种中是一致的，并且对片段长度、样本大小和混合片段中的混合程度的变化是稳健的。不过，我们也发现，在某些情况下（尤其是在狒狒中），与不使用混合片段训练的模型相比，使用混合片段训练的模型在对仅包含单一行为（纯粹）片段的测试数据进行分类时，准确率有所下降。基于这些结果，我们建议，当分类模型预计要处理相当大比例的混合行为片段（> 10%）时，将其纳入模型训练是有益的，否则就没有必要，但也没有坏处。例外情况是，有理由假定训练数据包含的混合片段比例高于实际（未观察到的）待分类数据--这种情况尤其可能发生在训练数据是在人工饲养条件下收集并用于对野外数据进行分类的情况下。在这种情况下，应避免在训练数据中包含过多的混合片段。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Movement Ecology Agricultural and Biological Sciences-Ecology, Evolution, Behavior and Systematics

CiteScore

6.60

自引率

4.90%

发文量

审稿时长

23 weeks

期刊介绍： Movement Ecology is an open-access interdisciplinary journal publishing novel insights from empirical and theoretical approaches into the ecology of movement of the whole organism - either animals, plants or microorganisms - as the central theme. We welcome manuscripts on any taxa and any movement phenomena (e.g. foraging, dispersal and seasonal migration) addressing important research questions on the patterns, mechanisms, causes and consequences of organismal movement. Manuscripts will be rigorously peer-reviewed to ensure novelty and high quality.