基于TDPSOLA的孟加拉语文本到语音合成系统Subachan的连接技术

K. Swarna, Abu Naser
{"title":"基于TDPSOLA的孟加拉语文本到语音合成系统Subachan的连接技术","authors":"K. Swarna, Abu Naser","doi":"10.1109/ICECE.2016.7853866","DOIUrl":null,"url":null,"abstract":"Creating an intelligible as well as a natural text to speech synthesizer has been the ultimate goal of researchers for the past 30 years; and concatenative synthesis provides the most natural speech. It is usual to have distortions in the concatenation points in concatenative speech synthesis, and therefore generating audible clicks in the synthesized speech. To solve this problem, several signal processing concatenation algorithms exist, such as TDPSOLA, FDPSOLA, MBROLA etc. This paper addresses the problem of audible discontinuities at the concatenation points of diphones in Bengali speech synthesizer Subachan and solving it using TDPSOLA. In the process of doing this, we detected correct pitch mark locations of diphones, detected voiced and unvoiced speech frames of diphones and finally concatenated those diphones using TDPSOLA after rescaling them to remove energy mismatches. As a result, the audible clicks in the concatenation points are removed and speech with much better quality is generated.","PeriodicalId":122930,"journal":{"name":"2016 9th International Conference on Electrical and Computer Engineering (ICECE)","volume":"187 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A TDPSOLA based concatenation technique for Bengali text to speech synthesis system Subachan\",\"authors\":\"K. Swarna, Abu Naser\",\"doi\":\"10.1109/ICECE.2016.7853866\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Creating an intelligible as well as a natural text to speech synthesizer has been the ultimate goal of researchers for the past 30 years; and concatenative synthesis provides the most natural speech. It is usual to have distortions in the concatenation points in concatenative speech synthesis, and therefore generating audible clicks in the synthesized speech. To solve this problem, several signal processing concatenation algorithms exist, such as TDPSOLA, FDPSOLA, MBROLA etc. This paper addresses the problem of audible discontinuities at the concatenation points of diphones in Bengali speech synthesizer Subachan and solving it using TDPSOLA. In the process of doing this, we detected correct pitch mark locations of diphones, detected voiced and unvoiced speech frames of diphones and finally concatenated those diphones using TDPSOLA after rescaling them to remove energy mismatches. As a result, the audible clicks in the concatenation points are removed and speech with much better quality is generated.\",\"PeriodicalId\":122930,\"journal\":{\"name\":\"2016 9th International Conference on Electrical and Computer Engineering (ICECE)\",\"volume\":\"187 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 9th International Conference on Electrical and Computer Engineering (ICECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECE.2016.7853866\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 9th International Conference on Electrical and Computer Engineering (ICECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECE.2016.7853866","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在过去的30年里,创造一个可理解的、自然的文本语音合成器一直是研究人员的终极目标;串联合成提供了最自然的语音。在拼接语音合成中,拼接点通常会出现失真,从而在合成语音中产生可听的咔嗒声。为了解决这一问题,有几种信号处理级联算法,如TDPSOLA、FDPSOLA、MBROLA等。本文研究了孟加拉语语音合成器Subachan中双声道连接点的可听不连续问题,并采用TDPSOLA解决了该问题。在此过程中,我们检测了双声部正确的音高标记位置,检测了双声部的浊音和不浊音语音帧,最后通过TDPSOLA将这些双声部重新缩放以去除能量不匹配后进行拼接。结果,在连接点的可听的点击被删除,语音质量更好的产生。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A TDPSOLA based concatenation technique for Bengali text to speech synthesis system Subachan
Creating an intelligible as well as a natural text to speech synthesizer has been the ultimate goal of researchers for the past 30 years; and concatenative synthesis provides the most natural speech. It is usual to have distortions in the concatenation points in concatenative speech synthesis, and therefore generating audible clicks in the synthesized speech. To solve this problem, several signal processing concatenation algorithms exist, such as TDPSOLA, FDPSOLA, MBROLA etc. This paper addresses the problem of audible discontinuities at the concatenation points of diphones in Bengali speech synthesizer Subachan and solving it using TDPSOLA. In the process of doing this, we detected correct pitch mark locations of diphones, detected voiced and unvoiced speech frames of diphones and finally concatenated those diphones using TDPSOLA after rescaling them to remove energy mismatches. As a result, the audible clicks in the concatenation points are removed and speech with much better quality is generated.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信