CRF-based Single-stage Acoustic Modeling with CTC Topology

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2019-05-12 DOI:10.1109/ICASSP.2019.8682256

Hongyu Xiang, Zhijian Ou

引用次数: 23

Abstract

In this paper, we develop conditional random field (CRF) based single-stage (SS) acoustic modeling with connectionist temporal classification (CTC) inspired state topology, which is called CTC-CRF for short. CTC-CRF is conceptually simple, which basically implements a CRF layer on top of features generated by the bottom neural network with the special state topology. Like SS-LF-MMI (lattice-free maximum-mutual-information), CTC-CRFs can be trained from scratch (flat-start), eliminating GMM-HMM pre-training and tree-building. Evaluation experiments are conducted on the WSJ, Switchboard and Librispeech datasets. In a head-to-head comparison, the CTC-CRF model using simple Bidirectional LSTMs consistently outperforms the strong SS-LF-MMI, across all the three benchmarking datasets and in both cases of mono-phones and mono-chars. Additionally, CTC-CRFs avoid some ad-hoc operation in SS-LF-MMI.

查看原文本刊更多论文

基于crf的CTC拓扑单级声学建模

本文提出了一种基于条件随机场(CRF)的基于连接时间分类(CTC)启发状态拓扑的单级(SS)声学建模方法，简称CTC-CRF。CTC-CRF概念简单，基本上是在底层神经网络生成的具有特殊状态拓扑的特征之上实现一个CRF层。像SS-LF-MMI(无格最大互信息)一样，CTC-CRFs可以从头开始训练，消除了GMM-HMM预训练和树构建。在WSJ、Switchboard和librisspeech数据集上进行了评估实验。在正面比较中，使用简单双向lstm的CTC-CRF模型在所有三个基准数据集以及单电话和单字符的情况下始终优于强大的SS-LF-MMI。此外，CTC-CRFs避免了SS-LF-MMI中的一些特别操作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量