Barbara D Fontana, Camilla W Pretzel, Mariana L Müller, Kimberly Fontoura, Khadija A Mohammed, Eduarda T Saccol, Falco L Gonçalves, Angela E Uchoa, Carolina C Jardim, Isabella P Silva, Rossano M Silva, Hevelyn S Moraes, Cássio M Resmim, Julia Canzian, Denis B Rosemberg
{"title":"比较人类标注和机器学习模型在优化斑马鱼癫痫分析行为分类中的应用。","authors":"Barbara D Fontana, Camilla W Pretzel, Mariana L Müller, Kimberly Fontoura, Khadija A Mohammed, Eduarda T Saccol, Falco L Gonçalves, Angela E Uchoa, Carolina C Jardim, Isabella P Silva, Rossano M Silva, Hevelyn S Moraes, Cássio M Resmim, Julia Canzian, Denis B Rosemberg","doi":"10.1016/j.jneumeth.2025.110603","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Accurate and scalable behavioral annotation remains a challenge in behavioral neuroscience. Manual scoring is time-consuming, variable across annotators, and may overlook transient behaviors critical for phenotyping. By learning from annotated datasets, supervised machine learning (ML) enables automated classification of behavior with high consistency and reduced bias.</p><p><strong>New method: </strong>We benchmarked five supervised ML algorithms, Random Forest, XGBoost, Support Vector Machine, k-Nearest Neighbors, and Multilayer Perceptron (MLP), and compared data against expert human annotations of seizure-like behaviors in adult zebrafish. Twelve trained raters annotated over 43,000 video frames, enabling direct comparison of model performance with human annotation. After frame-level analysis, we also applied behavior-informed filters and then evaluated block-level temporal aggregation.</p><p><strong>Results: </strong>Annotation variability was driven by behavioral complexity, with ambiguous behaviors showing the lowest agreement. Random Forest, XGBoost, and MLP achieved the highest accuracy and post-processing based on posture and velocity improved classification by filtering false positives. Block-level aggregation enhanced accuracy through temporal smoothing but masked short-lived behaviors critical for detecting subtle phenotypes.</p><p><strong>Comparison with existing methods: </strong>Most zebrafish seizure studies rely on manual scoring or single-model ML applications. Direct comparisons between multiple ML algorithms and human annotations are rare. Our study uniquely integrates large-scale manual scoring with model benchmarking and temporal resolution strategies, offering insight into reproducibility and scalability in behavioral phenotyping.</p><p><strong>Conclusions: </strong>This study advances automated behavioral analysis in zebrafish by demonstrating the strengths and limitations of machine learning compared to human annotation, and emphasizes how choices in temporal resolution and behavioral classification influence reproducibility and interpretability.</p>","PeriodicalId":16415,"journal":{"name":"Journal of Neuroscience Methods","volume":" ","pages":"110603"},"PeriodicalIF":2.3000,"publicationDate":"2025-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing Human Annotation and Machine Learning Models for Optimizing Zebrafish Behavioral Classification in Seizure Analysis.\",\"authors\":\"Barbara D Fontana, Camilla W Pretzel, Mariana L Müller, Kimberly Fontoura, Khadija A Mohammed, Eduarda T Saccol, Falco L Gonçalves, Angela E Uchoa, Carolina C Jardim, Isabella P Silva, Rossano M Silva, Hevelyn S Moraes, Cássio M Resmim, Julia Canzian, Denis B Rosemberg\",\"doi\":\"10.1016/j.jneumeth.2025.110603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Accurate and scalable behavioral annotation remains a challenge in behavioral neuroscience. Manual scoring is time-consuming, variable across annotators, and may overlook transient behaviors critical for phenotyping. By learning from annotated datasets, supervised machine learning (ML) enables automated classification of behavior with high consistency and reduced bias.</p><p><strong>New method: </strong>We benchmarked five supervised ML algorithms, Random Forest, XGBoost, Support Vector Machine, k-Nearest Neighbors, and Multilayer Perceptron (MLP), and compared data against expert human annotations of seizure-like behaviors in adult zebrafish. Twelve trained raters annotated over 43,000 video frames, enabling direct comparison of model performance with human annotation. After frame-level analysis, we also applied behavior-informed filters and then evaluated block-level temporal aggregation.</p><p><strong>Results: </strong>Annotation variability was driven by behavioral complexity, with ambiguous behaviors showing the lowest agreement. Random Forest, XGBoost, and MLP achieved the highest accuracy and post-processing based on posture and velocity improved classification by filtering false positives. Block-level aggregation enhanced accuracy through temporal smoothing but masked short-lived behaviors critical for detecting subtle phenotypes.</p><p><strong>Comparison with existing methods: </strong>Most zebrafish seizure studies rely on manual scoring or single-model ML applications. Direct comparisons between multiple ML algorithms and human annotations are rare. Our study uniquely integrates large-scale manual scoring with model benchmarking and temporal resolution strategies, offering insight into reproducibility and scalability in behavioral phenotyping.</p><p><strong>Conclusions: </strong>This study advances automated behavioral analysis in zebrafish by demonstrating the strengths and limitations of machine learning compared to human annotation, and emphasizes how choices in temporal resolution and behavioral classification influence reproducibility and interpretability.</p>\",\"PeriodicalId\":16415,\"journal\":{\"name\":\"Journal of Neuroscience Methods\",\"volume\":\" \",\"pages\":\"110603\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-10-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Neuroscience Methods\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jneumeth.2025.110603\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Neuroscience Methods","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jneumeth.2025.110603","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Comparing Human Annotation and Machine Learning Models for Optimizing Zebrafish Behavioral Classification in Seizure Analysis.
Background: Accurate and scalable behavioral annotation remains a challenge in behavioral neuroscience. Manual scoring is time-consuming, variable across annotators, and may overlook transient behaviors critical for phenotyping. By learning from annotated datasets, supervised machine learning (ML) enables automated classification of behavior with high consistency and reduced bias.
New method: We benchmarked five supervised ML algorithms, Random Forest, XGBoost, Support Vector Machine, k-Nearest Neighbors, and Multilayer Perceptron (MLP), and compared data against expert human annotations of seizure-like behaviors in adult zebrafish. Twelve trained raters annotated over 43,000 video frames, enabling direct comparison of model performance with human annotation. After frame-level analysis, we also applied behavior-informed filters and then evaluated block-level temporal aggregation.
Results: Annotation variability was driven by behavioral complexity, with ambiguous behaviors showing the lowest agreement. Random Forest, XGBoost, and MLP achieved the highest accuracy and post-processing based on posture and velocity improved classification by filtering false positives. Block-level aggregation enhanced accuracy through temporal smoothing but masked short-lived behaviors critical for detecting subtle phenotypes.
Comparison with existing methods: Most zebrafish seizure studies rely on manual scoring or single-model ML applications. Direct comparisons between multiple ML algorithms and human annotations are rare. Our study uniquely integrates large-scale manual scoring with model benchmarking and temporal resolution strategies, offering insight into reproducibility and scalability in behavioral phenotyping.
Conclusions: This study advances automated behavioral analysis in zebrafish by demonstrating the strengths and limitations of machine learning compared to human annotation, and emphasizes how choices in temporal resolution and behavioral classification influence reproducibility and interpretability.
期刊介绍:
The Journal of Neuroscience Methods publishes papers that describe new methods that are specifically for neuroscience research conducted in invertebrates, vertebrates or in man. Major methodological improvements or important refinements of established neuroscience methods are also considered for publication. The Journal''s Scope includes all aspects of contemporary neuroscience research, including anatomical, behavioural, biochemical, cellular, computational, molecular, invasive and non-invasive imaging, optogenetic, and physiological research investigations.