Cepstral-Basis-Decomposed Nonnegative Matrix Factorization for Speech Signal Modeling

IF 1.1 4区工程技术 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEJ Transactions on Electrical and Electronic Engineering Pub Date : 2025-05-26 DOI:10.1002/tee.70028

Fuga Oshima, Masashi Nakayama

{"title":"Cepstral-Basis-Decomposed Nonnegative Matrix Factorization for Speech Signal Modeling","authors":"Fuga Oshima, Masashi Nakayama","doi":"10.1002/tee.70028","DOIUrl":null,"url":null,"abstract":"<p>This study presents an enhanced nonnegative matrix factorization (NMF) algorithm designed for speech signal modeling. NMF has demonstrated efficacy across various applications to musical instrument signals, including audio source separation and music transcription. Nevertheless, its application to speech signals often results in diminished performance due to inadequate modeling arising from the spectral continuity of the speech signal. Hence, we introduced a pioneering approach termed cepstral-basis-decomposed NMF (CBD-NMF), which incorporates cepstrum analysis to enhance the modeling of speech signals. In the practical experiment, CBD-NMF is not necessarily convergence-guaranteed due to the flooring process; however, the experiment has revealed parameters that allow for stable optimization, ensuring that the cost function does not increase. By experimentally modeling Japanese vowel speech signals, we demonstrate that CBD-NMF induces better representation, in which one basis arises for one mora in Japanese. Additionally, when modeling a word in Japanese speech signals, CBD-NMF tends to induce a sparse representation equivalent to a sparse NMF with an extremely large weight coefficient. Our proposed framework can be applied to practical applications such as audio source separation and is expected to contribute to performance improvements when targeting speech signals. © 2025 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.</p>","PeriodicalId":13435,"journal":{"name":"IEEJ Transactions on Electrical and Electronic Engineering","volume":"20 9","pages":"1452-1459"},"PeriodicalIF":1.1000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEJ Transactions on Electrical and Electronic Engineering","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/tee.70028","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

This study presents an enhanced nonnegative matrix factorization (NMF) algorithm designed for speech signal modeling. NMF has demonstrated efficacy across various applications to musical instrument signals, including audio source separation and music transcription. Nevertheless, its application to speech signals often results in diminished performance due to inadequate modeling arising from the spectral continuity of the speech signal. Hence, we introduced a pioneering approach termed cepstral-basis-decomposed NMF (CBD-NMF), which incorporates cepstrum analysis to enhance the modeling of speech signals. In the practical experiment, CBD-NMF is not necessarily convergence-guaranteed due to the flooring process; however, the experiment has revealed parameters that allow for stable optimization, ensuring that the cost function does not increase. By experimentally modeling Japanese vowel speech signals, we demonstrate that CBD-NMF induces better representation, in which one basis arises for one mora in Japanese. Additionally, when modeling a word in Japanese speech signals, CBD-NMF tends to induce a sparse representation equivalent to a sparse NMF with an extremely large weight coefficient. Our proposed framework can be applied to practical applications such as audio source separation and is expected to contribute to performance improvements when targeting speech signals. © 2025 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.

查看原文本刊更多论文

基于倒谱基分解的非负矩阵分解语音信号建模

提出了一种用于语音信号建模的增强非负矩阵分解（NMF）算法。NMF在乐器信号的各种应用中已经证明了其有效性，包括音频源分离和音乐转录。然而，将其应用于语音信号时，由于语音信号的频谱连续性导致建模不足，往往会导致性能下降。因此，我们引入了一种称为倒谱基分解NMF （CBD-NMF）的开创性方法，该方法结合倒谱分析来增强语音信号的建模。在实际实验中，由于铺地过程的存在，CBD-NMF不一定能保证收敛；然而，实验揭示了允许稳定优化的参数，确保成本函数不增加。通过对日语元音语音信号进行实验建模，我们证明了CBD-NMF诱导了更好的表征，在这种表征中，日语中的一个元音产生一个基础。此外，在对日语语音信号中的单词建模时，CBD-NMF倾向于诱导出相当于具有极大权重系数的稀疏NMF的稀疏表示。我们提出的框架可以应用于实际应用，如音频源分离，并有望在针对语音信号时有助于提高性能。©2025日本电气工程师协会和Wiley期刊有限责任公司。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEJ Transactions on Electrical and Electronic Engineering 工程技术-工程：电子与电气

CiteScore

2.70

自引率

10.00%

发文量

199

审稿时长

4.3 months

期刊介绍： IEEJ Transactions on Electrical and Electronic Engineering (hereinafter called TEEE ) publishes 6 times per year as an official journal of the Institute of Electrical Engineers of Japan (hereinafter "IEEJ"). This peer-reviewed journal contains original research papers and review articles on the most important and latest technological advances in core areas of Electrical and Electronic Engineering and in related disciplines. The journal also publishes short communications reporting on the results of the latest research activities TEEE ) aims to provide a new forum for IEEJ members in Japan as well as fellow researchers in Electrical and Electronic Engineering from around the world to exchange ideas and research findings.