求助PDF
{"title":"基于倒谱基分解的非负矩阵分解语音信号建模","authors":"Fuga Oshima, Masashi Nakayama","doi":"10.1002/tee.70028","DOIUrl":null,"url":null,"abstract":"<p>This study presents an enhanced nonnegative matrix factorization (NMF) algorithm designed for speech signal modeling. NMF has demonstrated efficacy across various applications to musical instrument signals, including audio source separation and music transcription. Nevertheless, its application to speech signals often results in diminished performance due to inadequate modeling arising from the spectral continuity of the speech signal. Hence, we introduced a pioneering approach termed cepstral-basis-decomposed NMF (CBD-NMF), which incorporates cepstrum analysis to enhance the modeling of speech signals. In the practical experiment, CBD-NMF is not necessarily convergence-guaranteed due to the flooring process; however, the experiment has revealed parameters that allow for stable optimization, ensuring that the cost function does not increase. By experimentally modeling Japanese vowel speech signals, we demonstrate that CBD-NMF induces better representation, in which one basis arises for one mora in Japanese. Additionally, when modeling a word in Japanese speech signals, CBD-NMF tends to induce a sparse representation equivalent to a sparse NMF with an extremely large weight coefficient. Our proposed framework can be applied to practical applications such as audio source separation and is expected to contribute to performance improvements when targeting speech signals. © 2025 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.</p>","PeriodicalId":13435,"journal":{"name":"IEEJ Transactions on Electrical and Electronic Engineering","volume":"20 9","pages":"1452-1459"},"PeriodicalIF":1.1000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cepstral-Basis-Decomposed Nonnegative Matrix Factorization for Speech Signal Modeling\",\"authors\":\"Fuga Oshima, Masashi Nakayama\",\"doi\":\"10.1002/tee.70028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>This study presents an enhanced nonnegative matrix factorization (NMF) algorithm designed for speech signal modeling. NMF has demonstrated efficacy across various applications to musical instrument signals, including audio source separation and music transcription. Nevertheless, its application to speech signals often results in diminished performance due to inadequate modeling arising from the spectral continuity of the speech signal. Hence, we introduced a pioneering approach termed cepstral-basis-decomposed NMF (CBD-NMF), which incorporates cepstrum analysis to enhance the modeling of speech signals. In the practical experiment, CBD-NMF is not necessarily convergence-guaranteed due to the flooring process; however, the experiment has revealed parameters that allow for stable optimization, ensuring that the cost function does not increase. By experimentally modeling Japanese vowel speech signals, we demonstrate that CBD-NMF induces better representation, in which one basis arises for one mora in Japanese. Additionally, when modeling a word in Japanese speech signals, CBD-NMF tends to induce a sparse representation equivalent to a sparse NMF with an extremely large weight coefficient. Our proposed framework can be applied to practical applications such as audio source separation and is expected to contribute to performance improvements when targeting speech signals. © 2025 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC.</p>\",\"PeriodicalId\":13435,\"journal\":{\"name\":\"IEEJ Transactions on Electrical and Electronic Engineering\",\"volume\":\"20 9\",\"pages\":\"1452-1459\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2025-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEJ Transactions on Electrical and Electronic Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/tee.70028\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEJ Transactions on Electrical and Electronic Engineering","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/tee.70028","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
引用
批量引用