ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration

Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe
{"title":"ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration","authors":"Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe","doi":"arxiv-2409.09506","DOIUrl":null,"url":null,"abstract":"We introduce ESPnet-EZ, an extension of the open-source speech processing\ntoolkit ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ\nfocuses on two major aspects: (i) easy fine-tuning and inference of existing\nESPnet models on various tasks and (ii) easy integration with popular deep\nneural network frameworks such as PyTorch-Lightning, Hugging Face transformers\nand datasets, and Lhotse. By replacing ESPnet design choices inherited from\nKaldi with a Python-only, Bash-free interface, we dramatically reduce the\neffort required to build, debug, and use a new model. For example, to fine-tune\na speech foundation model, ESPnet-EZ, compared to ESPnet, reduces the number of\nnewly written code by 2.7x and the amount of dependent code by 6.7x while\ndramatically reducing the Bash script dependencies. The codebase of ESPnet-EZ\nis publicly available.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":"35 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09506","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ focuses on two major aspects: (i) easy fine-tuning and inference of existing ESPnet models on various tasks and (ii) easy integration with popular deep neural network frameworks such as PyTorch-Lightning, Hugging Face transformers and datasets, and Lhotse. By replacing ESPnet design choices inherited from Kaldi with a Python-only, Bash-free interface, we dramatically reduce the effort required to build, debug, and use a new model. For example, to fine-tune a speech foundation model, ESPnet-EZ, compared to ESPnet, reduces the number of newly written code by 2.7x and the amount of dependent code by 6.7x while dramatically reducing the Bash script dependencies. The codebase of ESPnet-EZ is publicly available.
ESPnet-EZ:仅使用 Python 的 ESPnet,易于微调和集成
我们介绍的 ESPnet-EZ 是开源语音处理工具包 ESPnet 的扩展,旨在快速、轻松地开发语音模型。ESPnet-EZ 主要关注两个方面:(i) 在各种任务中轻松微调和推断现有的 ESPnet 模型;(ii) 轻松集成流行的深度神经网络框架,如 PyTorch-Lightning、Hugging Face transformersand datasets 和 Lhotse。通过用纯 Python、无 Bash 界面取代从 Kaldi 继承而来的 ESPnet 设计选择,我们大大减少了构建、调试和使用新模型所需的工作量。例如,在微调语音基础模型时,ESPnet-EZ 与 ESPnet 相比,新编写代码的数量减少了 2.7 倍,依赖代码的数量减少了 6.7 倍,同时大大减少了对 Bash 脚本的依赖。ESPnet-EZ的代码库已经公开。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信