Tunable time delay neural networks for isolated word recognition

Proceedings IEEE Southeastcon '95. Visualize the Future Pub Date : 1995-03-26 DOI:10.1109/SECON.1995.513088

Duanpei Wu, J. Gowdy

引用次数: 0

Abstract

Describes a new neural network structure and a corresponding new sequential training technique for speech recognition. The proposed system is a modification of the original time delay neural network (TDNN) structure of Waibel et al. [1989]. The new structure consists of a group of sub-nets, and each isolated word or phoneme to be recognized corresponds to one sub-net. Since each sub-net deals with only one recognition unit, it may be trained independently. Each sub-net is a TDNN which the authors train with a new sequential training algorithm. The system has attained close to 100% accuracy for a multi-speaker, isolated word recognition task and 86.44% accuracy for a three voiced-stop-consonants ("B", "D" and "G"), speaker-independent phoneme recognition task. Results for phoneme recognition compared favorably with the best result obtained by Bryant [1992] using Sawai's block windowed neural network architecture with improvement by 14.44% for the same task.

查看原文本刊更多论文

用于孤立词识别的可调时滞神经网络

描述了一种新的神经网络结构和相应的新的语音识别序列训练技术。所提出的系统是对Waibel等[1989]的原始时滞神经网络(TDNN)结构的改进。新结构由一组子网组成，每个待识别的孤立词或音素对应一个子网。由于每个子网只处理一个识别单元，因此可以独立地进行训练。每个子网络是一个TDNN，作者使用一种新的顺序训练算法对其进行训练。该系统在多说话人、孤立的单词识别任务中达到接近100%的准确率，在三个语音停顿辅音(“B”、“D”和“G”)、独立于说话人的音素识别任务中达到86.44%的准确率。音素识别的结果与Bryant[1992]使用Sawai的块窗口神经网络架构获得的最佳结果相比，改进了14.44%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings IEEE Southeastcon '95. Visualize the Future

自引率

0.00%

发文量