Comparing Human Annotation and Machine Learning Models for Optimizing Zebrafish Behavioral Classification in Seizure Analysis.

IF 2.3 4区医学 Q2 BIOCHEMICAL RESEARCH METHODS

Journal of Neuroscience Methods Pub Date : 2025-10-21 DOI:10.1016/j.jneumeth.2025.110603

Barbara D Fontana, Camilla W Pretzel, Mariana L Müller, Kimberly Fontoura, Khadija A Mohammed, Eduarda T Saccol, Falco L Gonçalves, Angela E Uchoa, Carolina C Jardim, Isabella P Silva, Rossano M Silva, Hevelyn S Moraes, Cássio M Resmim, Julia Canzian, Denis B Rosemberg

{"title":"Comparing Human Annotation and Machine Learning Models for Optimizing Zebrafish Behavioral Classification in Seizure Analysis.","authors":"Barbara D Fontana, Camilla W Pretzel, Mariana L Müller, Kimberly Fontoura, Khadija A Mohammed, Eduarda T Saccol, Falco L Gonçalves, Angela E Uchoa, Carolina C Jardim, Isabella P Silva, Rossano M Silva, Hevelyn S Moraes, Cássio M Resmim, Julia Canzian, Denis B Rosemberg","doi":"10.1016/j.jneumeth.2025.110603","DOIUrl":null,"url":null,"abstract":"Background: Accurate and scalable behavioral annotation remains a challenge in behavioral neuroscience. Manual scoring is time-consuming, variable across annotators, and may overlook transient behaviors critical for phenotyping. By learning from annotated datasets, supervised machine learning (ML) enables automated classification of behavior with high consistency and reduced bias.New method: We benchmarked five supervised ML algorithms, Random Forest, XGBoost, Support Vector Machine, k-Nearest Neighbors, and Multilayer Perceptron (MLP), and compared data against expert human annotations of seizure-like behaviors in adult zebrafish. Twelve trained raters annotated over 43,000 video frames, enabling direct comparison of model performance with human annotation. After frame-level analysis, we also applied behavior-informed filters and then evaluated block-level temporal aggregation.Results: Annotation variability was driven by behavioral complexity, with ambiguous behaviors showing the lowest agreement. Random Forest, XGBoost, and MLP achieved the highest accuracy and post-processing based on posture and velocity improved classification by filtering false positives. Block-level aggregation enhanced accuracy through temporal smoothing but masked short-lived behaviors critical for detecting subtle phenotypes.Comparison with existing methods: Most zebrafish seizure studies rely on manual scoring or single-model ML applications. Direct comparisons between multiple ML algorithms and human annotations are rare. Our study uniquely integrates large-scale manual scoring with model benchmarking and temporal resolution strategies, offering insight into reproducibility and scalability in behavioral phenotyping.Conclusions: This study advances automated behavioral analysis in zebrafish by demonstrating the strengths and limitations of machine learning compared to human annotation, and emphasizes how choices in temporal resolution and behavioral classification influence reproducibility and interpretability.","PeriodicalId":16415,"journal":{"name":"Journal of Neuroscience Methods","volume":" ","pages":"110603"},"PeriodicalIF":2.3000,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Neuroscience Methods","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jneumeth.2025.110603","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Accurate and scalable behavioral annotation remains a challenge in behavioral neuroscience. Manual scoring is time-consuming, variable across annotators, and may overlook transient behaviors critical for phenotyping. By learning from annotated datasets, supervised machine learning (ML) enables automated classification of behavior with high consistency and reduced bias.

New method: We benchmarked five supervised ML algorithms, Random Forest, XGBoost, Support Vector Machine, k-Nearest Neighbors, and Multilayer Perceptron (MLP), and compared data against expert human annotations of seizure-like behaviors in adult zebrafish. Twelve trained raters annotated over 43,000 video frames, enabling direct comparison of model performance with human annotation. After frame-level analysis, we also applied behavior-informed filters and then evaluated block-level temporal aggregation.

Results: Annotation variability was driven by behavioral complexity, with ambiguous behaviors showing the lowest agreement. Random Forest, XGBoost, and MLP achieved the highest accuracy and post-processing based on posture and velocity improved classification by filtering false positives. Block-level aggregation enhanced accuracy through temporal smoothing but masked short-lived behaviors critical for detecting subtle phenotypes.

Comparison with existing methods: Most zebrafish seizure studies rely on manual scoring or single-model ML applications. Direct comparisons between multiple ML algorithms and human annotations are rare. Our study uniquely integrates large-scale manual scoring with model benchmarking and temporal resolution strategies, offering insight into reproducibility and scalability in behavioral phenotyping.

Conclusions: This study advances automated behavioral analysis in zebrafish by demonstrating the strengths and limitations of machine learning compared to human annotation, and emphasizes how choices in temporal resolution and behavioral classification influence reproducibility and interpretability.

查看原文本刊更多论文

比较人类标注和机器学习模型在优化斑马鱼癫痫分析行为分类中的应用。

背景：准确和可扩展的行为注释仍然是行为神经科学的一个挑战。手动评分是费时的，在注释者之间是可变的，并且可能忽略对表型至关重要的瞬时行为。通过从带注释的数据集中学习，监督式机器学习（ML）可以实现高一致性和减少偏差的行为自动分类。新方法：我们对随机森林、XGBoost、支持向量机、k近邻和多层感知器（MLP）这五种监督机器学习算法进行了基准测试，并将数据与成年斑马鱼癫痫样行为的专家注释进行了比较。12名训练有素的评分员对超过43,000个视频帧进行了注释，从而可以将模型的性能与人工注释进行直接比较。在帧级分析之后，我们还应用了行为信息过滤器，然后评估了块级时间聚合。结果：注释可变性受行为复杂性的影响，歧义行为的一致性最低。随机森林、XGBoost和MLP通过过滤误报实现了最高的准确率和基于姿态和速度改进分类的后处理。块级聚合通过时间平滑提高了准确性，但掩盖了对检测微妙表型至关重要的短暂行为。与现有方法的比较：大多数斑马鱼癫痫研究依赖于手动评分或单模型ML应用程序。在多个ML算法和人工注释之间进行直接比较是很少见的。我们的研究独特地将大规模人工评分与模型基准测试和时间解决策略相结合，为行为表型的可重复性和可扩展性提供了见解。结论：本研究通过展示机器学习与人类注释相比的优势和局限性，推进了斑马鱼的自动行为分析，并强调了时间分辨率和行为分类的选择如何影响再现性和可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Neuroscience Methods 医学-神经科学

CiteScore

7.10

自引率

3.30%

发文量

226

审稿时长

52 days

期刊介绍： The Journal of Neuroscience Methods publishes papers that describe new methods that are specifically for neuroscience research conducted in invertebrates, vertebrates or in man. Major methodological improvements or important refinements of established neuroscience methods are also considered for publication. The Journal''s Scope includes all aspects of contemporary neuroscience research, including anatomical, behavioural, biochemical, cellular, computational, molecular, invasive and non-invasive imaging, optogenetic, and physiological research investigations.