ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration

arXiv - CS - Sound Pub Date : 2024-09-14 DOI:arxiv-2409.09506

Masao Someki, Kwanghee Choi, Siddhant Arora, William Chen, Samuele Cornell, Jionghao Han, Yifan Peng, Jiatong Shi, Vaibhav Srivastav, Shinji Watanabe

引用次数: 0

Abstract

We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ focuses on two major aspects: (i) easy fine-tuning and inference of existing ESPnet models on various tasks and (ii) easy integration with popular deep neural network frameworks such as PyTorch-Lightning, Hugging Face transformers and datasets, and Lhotse. By replacing ESPnet design choices inherited from Kaldi with a Python-only, Bash-free interface, we dramatically reduce the effort required to build, debug, and use a new model. For example, to fine-tune a speech foundation model, ESPnet-EZ, compared to ESPnet, reduces the number of newly written code by 2.7x and the amount of dependent code by 6.7x while dramatically reducing the Bash script dependencies. The codebase of ESPnet-EZ is publicly available.

查看原文本刊更多论文

ESPnet-EZ：仅使用 Python 的 ESPnet，易于微调和集成

我们介绍的 ESPnet-EZ 是开源语音处理工具包 ESPnet 的扩展，旨在快速、轻松地开发语音模型。ESPnet-EZ 主要关注两个方面：(i) 在各种任务中轻松微调和推断现有的 ESPnet 模型；(ii) 轻松集成流行的深度神经网络框架，如 PyTorch-Lightning、Hugging Face transformersand datasets 和 Lhotse。通过用纯 Python、无 Bash 界面取代从 Kaldi 继承而来的 ESPnet 设计选择，我们大大减少了构建、调试和使用新模型所需的工作量。例如，在微调语音基础模型时，ESPnet-EZ 与 ESPnet 相比，新编写代码的数量减少了 2.7 倍，依赖代码的数量减少了 6.7 倍，同时大大减少了对 Bash 脚本的依赖。ESPnet-EZ的代码库已经公开。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Sound

自引率

0.00%

发文量