混合语音分割中元音起始和偏移点的探索

TENCON 2015 - 2015 IEEE Region 10 Conference Pub Date : 2015-11-01 DOI:10.1109/TENCON.2015.7373137

B. Sarma, Bidisha Sharma, S. Shanmugam, S. R. Mahadeva Prasanna, H. Murthy

{"title":"混合语音分割中元音起始和偏移点的探索","authors":"B. Sarma, Bidisha Sharma, S. Shanmugam, S. R. Mahadeva Prasanna, H. Murthy","doi":"10.1109/TENCON.2015.7373137","DOIUrl":null,"url":null,"abstract":"Automatic segmentation of speech using embedded reestimation of monophone hidden Markov models (HMMs) followed by forced alignment may not give accurate boundaries. Group delay (GD) processing for refining the boundaries at the syllable level is attempted earlier. This paper aims at exploring vowel onset point (VOP) and vowel offset or end point (VEP) for correcting the boundaries obtained using HMM alignment. HMM models the class information well, however may not detect the exact boundary. In case of VOPs and VEPs, spurious rate or miss rate can be there, but detected boundaries are more accurate. Combining both HMM and VOP/VEP gives improvement in terms of log likelihood scores of forced aligned phoneme boundaries. HMM boundaries are corrected using VOP/VEP and model parameters are reestimated at the syllable level. Results are compared with that of GD based correction and found that overall performance is comparable. Performance for vowels is found to be higher than that of GD based refinement as the refinement in this case is mainly at the vowel boundaries. HMM based speech synthesis systems (HTS) are developed using phone as a basic unit with the proposed segmentation method. Subjective evaluation indicates that there is an improvement in the quality of synthesis.","PeriodicalId":22200,"journal":{"name":"TENCON 2015 - 2015 IEEE Region 10 Conference","volume":"13 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Exploration of vowel onset and offset points for hybrid speech segmentation\",\"authors\":\"B. Sarma, Bidisha Sharma, S. Shanmugam, S. R. Mahadeva Prasanna, H. Murthy\",\"doi\":\"10.1109/TENCON.2015.7373137\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic segmentation of speech using embedded reestimation of monophone hidden Markov models (HMMs) followed by forced alignment may not give accurate boundaries. Group delay (GD) processing for refining the boundaries at the syllable level is attempted earlier. This paper aims at exploring vowel onset point (VOP) and vowel offset or end point (VEP) for correcting the boundaries obtained using HMM alignment. HMM models the class information well, however may not detect the exact boundary. In case of VOPs and VEPs, spurious rate or miss rate can be there, but detected boundaries are more accurate. Combining both HMM and VOP/VEP gives improvement in terms of log likelihood scores of forced aligned phoneme boundaries. HMM boundaries are corrected using VOP/VEP and model parameters are reestimated at the syllable level. Results are compared with that of GD based correction and found that overall performance is comparable. Performance for vowels is found to be higher than that of GD based refinement as the refinement in this case is mainly at the vowel boundaries. HMM based speech synthesis systems (HTS) are developed using phone as a basic unit with the proposed segmentation method. Subjective evaluation indicates that there is an improvement in the quality of synthesis.\",\"PeriodicalId\":22200,\"journal\":{\"name\":\"TENCON 2015 - 2015 IEEE Region 10 Conference\",\"volume\":\"13 1\",\"pages\":\"1-6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"TENCON 2015 - 2015 IEEE Region 10 Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TENCON.2015.7373137\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"TENCON 2015 - 2015 IEEE Region 10 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON.2015.7373137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

使用单声道隐马尔可夫模型(hmm)的嵌入重估计，然后进行强制对齐的自动语音分割可能无法给出准确的边界。先前尝试了在音节级别上对边界进行细化的组延迟(GD)处理。本文的目的是探索元音起始点(VOP)和元音偏移或终点(VEP)来校正HMM对齐得到的边界。HMM可以很好地建模类信息，但是可能无法检测到确切的边界。在VOPs和vep的情况下，可能存在假率或漏检率，但检测到的边界更准确。结合HMM和VOP/VEP，在强制对齐音素边界的对数似然分数方面得到了改进。使用VOP/VEP对HMM边界进行校正，并在音节级别重新估计模型参数。结果与基于GD的校正结果进行了比较，发现总体性能具有可比性。由于这种情况下的改进主要是在元音边界处，因此对元音的改进性能要高于基于GD的改进。以电话为基本单元，采用所提出的分割方法，开发了基于HMM的语音合成系统(HTS)。主观评价表明，综合质量有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploration of vowel onset and offset points for hybrid speech segmentation

Automatic segmentation of speech using embedded reestimation of monophone hidden Markov models (HMMs) followed by forced alignment may not give accurate boundaries. Group delay (GD) processing for refining the boundaries at the syllable level is attempted earlier. This paper aims at exploring vowel onset point (VOP) and vowel offset or end point (VEP) for correcting the boundaries obtained using HMM alignment. HMM models the class information well, however may not detect the exact boundary. In case of VOPs and VEPs, spurious rate or miss rate can be there, but detected boundaries are more accurate. Combining both HMM and VOP/VEP gives improvement in terms of log likelihood scores of forced aligned phoneme boundaries. HMM boundaries are corrected using VOP/VEP and model parameters are reestimated at the syllable level. Results are compared with that of GD based correction and found that overall performance is comparable. Performance for vowels is found to be higher than that of GD based refinement as the refinement in this case is mainly at the vowel boundaries. HMM based speech synthesis systems (HTS) are developed using phone as a basic unit with the proposed segmentation method. Subjective evaluation indicates that there is an improvement in the quality of synthesis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

TENCON 2015 - 2015 IEEE Region 10 Conference

自引率

0.00%

发文量