Segment-based speaker adaptation by neural network

Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop Pub Date : 1991-09-30 DOI:10.1109/NNSP.1991.239497

K. Fukuzawa, H. Sawai, M. Sugiyama

引用次数: 4

Abstract

The authors propose a segment-to-segment speaker adaptation technique using a feed-forward neural network with a time shifted sub-connection architecture. Differences in voice individuality exist in both the spectral and temporal domains. It is generally known that frame based speaker adaptation techniques can not compensate for speaker individuality in the temporal domain. Segment based speaker adaptation compensates for these spectral and temporal differences. The results of 23 Japanese phoneme recognition experiments using TDNN (time-delay neural network) show that the recognition rate by segment-based adaptations was 83.7%, 22.8% higher than the rate without adaptation.<>

查看原文本刊更多论文

基于分段的说话人神经网络自适应

作者提出了一种采用时移子连接结构的前馈神经网络的分段到分段说话人自适应技术。语音个性的差异存在于谱域和时域。基于框架的说话人自适应技术在时域上无法补偿说话人的个性。基于片段的说话人自适应补偿了这些频谱和时间差异。采用TDNN (time-delay neural network)进行的23个日语音素识别实验结果表明，基于片段适应的日语音素识别率为83.7%，比未经适应的日语音素识别率高22.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop

自引率

0.00%

发文量