Yehezkel S Resheff, Hanna M Bensch, Markus Zöttl, Roi Harel, Akiko Matsumoto-Oda, Margaret C Crofoot, Sara Gomez, Luca Börger, Shay Rotics
{"title":"How to treat mixed behavior segments in supervised machine learning of behavioural modes from inertial measurement data.","authors":"Yehezkel S Resheff, Hanna M Bensch, Markus Zöttl, Roi Harel, Akiko Matsumoto-Oda, Margaret C Crofoot, Sara Gomez, Luca Börger, Shay Rotics","doi":"10.1186/s40462-024-00485-7","DOIUrl":null,"url":null,"abstract":"<p><p>The application of supervised machine learning methods to identify behavioural modes from inertial measurements of bio-loggers has become a standard tool in behavioural ecology. Several design choices can affect the accuracy of identifying the behavioural modes. One such choice is the inclusion or exclusion of segments consisting of more than a single behaviour (mixed segments) in the machine learning model training data. Currently, the common practice is to ignore such segments during model training. In this paper we tested the hypothesis that including mixed segments in model training will improve accuracy, as the model would perform better in identifying them in the test data. We test this hypothesis using a series of data simulations on four datasets of accelerometer data coupled with behaviour observations, obtained from four study species (Damaraland mole-rats, meerkats, olive baboons, polar bears). Results show that when a substantial proportion of the test data are mixed behaviour segments (above ~ 10%), including mixed segments in machine learning model training improves the accuracy of classification. These results were consistent across the four study species, and robust to changes in segment length, sample size, and degree of mixture within the mixed segments. However, we also find that in some cases (particularly in baboons) models trained with mixed segments show reduced accuracy in classifying test data containing only single behaviour (pure) segments, compared to models trained without mixed segments. Based on these results, we recommend that when the classification model is expected to deal with a substantial proportion of mixed behaviour segments (> 10%), it is beneficial to include them in model training, otherwise, it is unnecessary but also not harmful. The exception is when there is a basis to assume that the training data contains a higher rate of mixed segments than the actual (unobserved) data to be classified-such a situation may occur particularly when training data are collected in captivity and used to classify data from the wild. In this case, excess inclusion of mixed segments in training data should probably be avoided.</p>","PeriodicalId":54288,"journal":{"name":"Movement Ecology","volume":"12 1","pages":"44"},"PeriodicalIF":3.4000,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11165886/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Movement Ecology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s40462-024-00485-7","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The application of supervised machine learning methods to identify behavioural modes from inertial measurements of bio-loggers has become a standard tool in behavioural ecology. Several design choices can affect the accuracy of identifying the behavioural modes. One such choice is the inclusion or exclusion of segments consisting of more than a single behaviour (mixed segments) in the machine learning model training data. Currently, the common practice is to ignore such segments during model training. In this paper we tested the hypothesis that including mixed segments in model training will improve accuracy, as the model would perform better in identifying them in the test data. We test this hypothesis using a series of data simulations on four datasets of accelerometer data coupled with behaviour observations, obtained from four study species (Damaraland mole-rats, meerkats, olive baboons, polar bears). Results show that when a substantial proportion of the test data are mixed behaviour segments (above ~ 10%), including mixed segments in machine learning model training improves the accuracy of classification. These results were consistent across the four study species, and robust to changes in segment length, sample size, and degree of mixture within the mixed segments. However, we also find that in some cases (particularly in baboons) models trained with mixed segments show reduced accuracy in classifying test data containing only single behaviour (pure) segments, compared to models trained without mixed segments. Based on these results, we recommend that when the classification model is expected to deal with a substantial proportion of mixed behaviour segments (> 10%), it is beneficial to include them in model training, otherwise, it is unnecessary but also not harmful. The exception is when there is a basis to assume that the training data contains a higher rate of mixed segments than the actual (unobserved) data to be classified-such a situation may occur particularly when training data are collected in captivity and used to classify data from the wild. In this case, excess inclusion of mixed segments in training data should probably be avoided.
Movement EcologyAgricultural and Biological Sciences-Ecology, Evolution, Behavior and Systematics
CiteScore
6.60
自引率
4.90%
发文量
47
审稿时长
23 weeks
期刊介绍:
Movement Ecology is an open-access interdisciplinary journal publishing novel insights from empirical and theoretical approaches into the ecology of movement of the whole organism - either animals, plants or microorganisms - as the central theme. We welcome manuscripts on any taxa and any movement phenomena (e.g. foraging, dispersal and seasonal migration) addressing important research questions on the patterns, mechanisms, causes and consequences of organismal movement. Manuscripts will be rigorously peer-reviewed to ensure novelty and high quality.