{"title":"基于生成扩散模型的语音增强","authors":"O. V. Girfanov, A. G. Shishkin","doi":"10.3103/S0005105523050035","DOIUrl":null,"url":null,"abstract":"<p>An alternative approach to speech denoising using generative diffusion models that model the distribution of training data is proposed. In recent years, such models have led to promising results to be obtained in the field of generating signals of various kinds, and these are superior in many ways to previous generative models, such as variational autoencoders. However, diffusion models have not yet found wide application in the field of speech denoising. A new diffusion model is presented, which can be used to denoise real speech signals using a deep neural network. Our own data set, with more than 150 h of pure speech in Russian, has been created. The obtained results, estimated using the metrics scale invariant signal to distortion ratio and perceptual evaluation of speech quality, are comparable or superior to the results of the best discriminative models.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Speech Enhancement with Generative Diffusion Models\",\"authors\":\"O. V. Girfanov, A. G. Shishkin\",\"doi\":\"10.3103/S0005105523050035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>An alternative approach to speech denoising using generative diffusion models that model the distribution of training data is proposed. In recent years, such models have led to promising results to be obtained in the field of generating signals of various kinds, and these are superior in many ways to previous generative models, such as variational autoencoders. However, diffusion models have not yet found wide application in the field of speech denoising. A new diffusion model is presented, which can be used to denoise real speech signals using a deep neural network. Our own data set, with more than 150 h of pure speech in Russian, has been created. The obtained results, estimated using the metrics scale invariant signal to distortion ratio and perceptual evaluation of speech quality, are comparable or superior to the results of the best discriminative models.</p>\",\"PeriodicalId\":42995,\"journal\":{\"name\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2023-11-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S0005105523050035\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S0005105523050035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Speech Enhancement with Generative Diffusion Models
An alternative approach to speech denoising using generative diffusion models that model the distribution of training data is proposed. In recent years, such models have led to promising results to be obtained in the field of generating signals of various kinds, and these are superior in many ways to previous generative models, such as variational autoencoders. However, diffusion models have not yet found wide application in the field of speech denoising. A new diffusion model is presented, which can be used to denoise real speech signals using a deep neural network. Our own data set, with more than 150 h of pure speech in Russian, has been created. The obtained results, estimated using the metrics scale invariant signal to distortion ratio and perceptual evaluation of speech quality, are comparable or superior to the results of the best discriminative models.
期刊介绍:
Automatic Documentation and Mathematical Linguistics is an international peer reviewed journal that covers all aspects of automation of information processes and systems, as well as algorithms and methods for automatic language analysis. Emphasis is on the practical applications of new technologies and techniques for information analysis and processing.