vak: a neural network framework for researchers studying animal acoustic communication

Proceedings of the Python in Science Conference Pub Date : 1900-01-01 DOI:10.25080/gerudo-f2bc6f59-008

D. Nicholson, Y. Cohen

{"title":"vak: a neural network framework for researchers studying animal acoustic communication","authors":"D. Nicholson, Y. Cohen","doi":"10.25080/gerudo-f2bc6f59-008","DOIUrl":null,"url":null,"abstract":"—How is speech like birdsong? What do we mean when we say an animal learns their vocalizations? Questions like these are answered by studying how animals communicate with sound. As in many other fields, the study of acoustic communication is being revolutionized by deep neural network models. These models enable answering questions that were previously impossible to address, in part because the models automate analysis of very large datasets. Acoustic communication researchers have developed multiple models for similar tasks, often implemented as research code with one of several libraries, such as Keras and Pytorch. This situation has created a real need for a framework that allows researchers to easily benchmark multiple models, and test new models, with their own data. To address this need, we developed vak (https://github.com/vocalpy/vak), a neural network framework designed for acoustic communication researchers. (\"vak\" is pronounced like \"talk\" or \"squawk\" and was chosen for its similarity to the Latin root voc , as in \"vocal\".) Here we describe the design of the vak, and explain how the framework makes it easy for researchers to apply neural network models to their own data. We highlight enhancements made in version 1.0 that significantly improve user experience with the library. To provide researchers without expertise in deep learning access to these models, vak can be run via a command-line interface that uses configuration files. Vak can also be used directly in scripts by scientist-coders. To achieve this, vak adapts design patterns and an API from other domain-specific PyTorch libraries such as torchvision, with modules representing neural network operations, models, datasets, and transformations for pre-and post-processing. vak also leverages the Lightning library as a backend, so that vak developers and users can focus on the domain. We provide proof-of-concept results showing how vak can be used to test new models and compare existing models from multiple model families. In closing we discuss our roadmap for development and vision for the community","PeriodicalId":364654,"journal":{"name":"Proceedings of the Python in Science Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Python in Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25080/gerudo-f2bc6f59-008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

—How is speech like birdsong? What do we mean when we say an animal learns their vocalizations? Questions like these are answered by studying how animals communicate with sound. As in many other fields, the study of acoustic communication is being revolutionized by deep neural network models. These models enable answering questions that were previously impossible to address, in part because the models automate analysis of very large datasets. Acoustic communication researchers have developed multiple models for similar tasks, often implemented as research code with one of several libraries, such as Keras and Pytorch. This situation has created a real need for a framework that allows researchers to easily benchmark multiple models, and test new models, with their own data. To address this need, we developed vak (https://github.com/vocalpy/vak), a neural network framework designed for acoustic communication researchers. ("vak" is pronounced like "talk" or "squawk" and was chosen for its similarity to the Latin root voc , as in "vocal".) Here we describe the design of the vak, and explain how the framework makes it easy for researchers to apply neural network models to their own data. We highlight enhancements made in version 1.0 that significantly improve user experience with the library. To provide researchers without expertise in deep learning access to these models, vak can be run via a command-line interface that uses configuration files. Vak can also be used directly in scripts by scientist-coders. To achieve this, vak adapts design patterns and an API from other domain-specific PyTorch libraries such as torchvision, with modules representing neural network operations, models, datasets, and transformations for pre-and post-processing. vak also leverages the Lightning library as a backend, so that vak developers and users can focus on the domain. We provide proof-of-concept results showing how vak can be used to test new models and compare existing models from multiple model families. In closing we discuss our roadmap for development and vision for the community

查看原文本刊更多论文

Vak:一个用于研究动物声音交流的神经网络框架

说话怎么像鸟叫?我们说动物学会发声是什么意思?这样的问题可以通过研究动物如何用声音交流来回答。与许多其他领域一样，深度神经网络模型正在彻底改变声通信的研究。这些模型可以回答以前无法解决的问题，部分原因是这些模型可以自动分析非常大的数据集。声学通信研究人员已经为类似的任务开发了多个模型，通常使用几个库中的一个实现为研究代码，例如Keras和Pytorch。这种情况确实需要一个框架，使研究人员能够轻松地对多个模型进行基准测试，并使用自己的数据测试新模型。为了满足这一需求，我们开发了vak (https://github.com/vocalpy/vak)，这是一个为声学通信研究人员设计的神经网络框架。(“vak”的发音类似于“talk”或“squawk”，选择它是因为它与拉丁语词根voc相似，如“vocal”。)在这里，我们描述了vak的设计，并解释了该框架如何使研究人员能够轻松地将神经网络模型应用于他们自己的数据。我们重点介绍1.0版本中所做的增强，这些增强显著改善了库的用户体验。为了让没有深度学习专业知识的研究人员能够访问这些模型，vak可以通过使用配置文件的命令行界面运行。Vak也可以由科学家编码人员直接在脚本中使用。为了实现这一点，vak采用了来自其他特定领域PyTorch库(如torchvision)的设计模式和API，并使用模块表示神经网络操作、模型、数据集和用于预处理和后处理的转换。vak还利用闪电库作为后端，以便vak开发人员和用户可以专注于该领域。我们提供了概念验证结果，展示了如何使用vak来测试新模型并比较来自多个模型族的现有模型。最后，我们将讨论我们的发展路线图和社区愿景

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Python in Science Conference

自引率

0.00%

发文量