利用声码中心声学模型提高人工耳蜗的语音清晰度

A. R. Gladston, P. Vijayalakshmi, N. Thangavelu
{"title":"利用声码中心声学模型提高人工耳蜗的语音清晰度","authors":"A. R. Gladston, P. Vijayalakshmi, N. Thangavelu","doi":"10.1109/ICRTIT.2012.6206795","DOIUrl":null,"url":null,"abstract":"The cochlear implant is a prosthetic device, used to replace a damaged inner ear. It consists of an externally worn speech processor and an internal receiver stimulator. The cochlear implant is patient specific and system specific and so in the current work, a lab model for the speech processor, based on various vocoder models is designed to analyse the effect of system specific parameters such as filter bandwidth, number of channels and vocal excitation, on the speech intelligibility. Initially a formant vocoder is designed and used in the analysis and synthesis of English vowels. A channel vocoder is then developed for the same and extended to perform the analysis and synthesis of words from the Lexical Neighbourhood Test and sentences from the TIMIT database. The effect of number of channels on the synthetic speech quality is analysed and a 21-channel vocoder is found to yield the best response with a mean opinion score (MOS) of 4 out of 5 for vowels and 3.4 for sentences. The formant trajectories and CosH distance are also used to validate the speech intelligibility. The influence of glottal pulse on speech intelligibility is analysed and the synthetic speech is found to sound more natural with a glottal pulse train than an impulse train with an MOS of 4.2 for vowels and 4 for sentences.","PeriodicalId":191151,"journal":{"name":"2012 International Conference on Recent Trends in Information Technology","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Improving speech intelligibility in cochlear implants using vocoder-centric acoustic models\",\"authors\":\"A. R. Gladston, P. Vijayalakshmi, N. Thangavelu\",\"doi\":\"10.1109/ICRTIT.2012.6206795\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The cochlear implant is a prosthetic device, used to replace a damaged inner ear. It consists of an externally worn speech processor and an internal receiver stimulator. The cochlear implant is patient specific and system specific and so in the current work, a lab model for the speech processor, based on various vocoder models is designed to analyse the effect of system specific parameters such as filter bandwidth, number of channels and vocal excitation, on the speech intelligibility. Initially a formant vocoder is designed and used in the analysis and synthesis of English vowels. A channel vocoder is then developed for the same and extended to perform the analysis and synthesis of words from the Lexical Neighbourhood Test and sentences from the TIMIT database. The effect of number of channels on the synthetic speech quality is analysed and a 21-channel vocoder is found to yield the best response with a mean opinion score (MOS) of 4 out of 5 for vowels and 3.4 for sentences. The formant trajectories and CosH distance are also used to validate the speech intelligibility. The influence of glottal pulse on speech intelligibility is analysed and the synthetic speech is found to sound more natural with a glottal pulse train than an impulse train with an MOS of 4.2 for vowels and 4 for sentences.\",\"PeriodicalId\":191151,\"journal\":{\"name\":\"2012 International Conference on Recent Trends in Information Technology\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 International Conference on Recent Trends in Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRTIT.2012.6206795\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Recent Trends in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRTIT.2012.6206795","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

人工耳蜗是一种假体装置,用来代替受损的内耳。它包括一个外部佩戴的语音处理器和一个内部的接收刺激器。人工耳蜗具有患者特异性和系统特异性,因此在本工作中,基于各种声码器模型,设计了语音处理器的实验室模型,以分析滤波器带宽、通道数和声音激励等系统特定参数对语音可理解性的影响。首先设计了一个构象声码器,并将其用于英语元音的分析和合成。然后为此开发了一个通道声码器,并扩展到对词汇邻域测试中的单词和TIMIT数据库中的句子进行分析和合成。分析了通道数对合成语音质量的影响,发现21通道声码器产生最佳响应,元音的平均意见得分(MOS)为4分(满分5分),句子的平均意见得分(MOS)为3.4分。形成峰轨迹和CosH距离也被用来验证语音可理解性。分析了声门脉冲对语音清晰度的影响,发现声门脉冲序列的合成语音听起来比声门脉冲序列的合成语音听起来更自然,声门脉冲序列的元音MOS为4.2,句子MOS为4。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improving speech intelligibility in cochlear implants using vocoder-centric acoustic models
The cochlear implant is a prosthetic device, used to replace a damaged inner ear. It consists of an externally worn speech processor and an internal receiver stimulator. The cochlear implant is patient specific and system specific and so in the current work, a lab model for the speech processor, based on various vocoder models is designed to analyse the effect of system specific parameters such as filter bandwidth, number of channels and vocal excitation, on the speech intelligibility. Initially a formant vocoder is designed and used in the analysis and synthesis of English vowels. A channel vocoder is then developed for the same and extended to perform the analysis and synthesis of words from the Lexical Neighbourhood Test and sentences from the TIMIT database. The effect of number of channels on the synthetic speech quality is analysed and a 21-channel vocoder is found to yield the best response with a mean opinion score (MOS) of 4 out of 5 for vowels and 3.4 for sentences. The formant trajectories and CosH distance are also used to validate the speech intelligibility. The influence of glottal pulse on speech intelligibility is analysed and the synthetic speech is found to sound more natural with a glottal pulse train than an impulse train with an MOS of 4.2 for vowels and 4 for sentences.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信