Speech Intention Classification with Multimodal Deep Learning.

Advances in artificial intelligence. Canadian Society for Computational Studies of Intelligence. Conference Pub Date : 2017-05-01 Epub Date: 2017-04-11 DOI:10.1007/978-3-319-57351-9_30

Yue Gu, Xinyu Li, Shuhong Chen, Jianyu Zhang, Ivan Marsic

引用次数: 0

Abstract

We present a novel multimodal deep learning structure that automatically extracts features from textual-acoustic data for sentence-level speech classification. Textual and acoustic features were first extracted using two independent convolutional neural network structures, then combined into a joint representation, and finally fed into a decision softmax layer. We tested the proposed model in an actual medical setting, using speech recording and its transcribed log. Our model achieved 83.10% average accuracy in detecting 6 different intentions. We also found that our model using automatically extracted features for intention classification outperformed existing models that use manufactured features.

Abstract Image

查看原文本刊更多论文

多模式深度学习的言语意图分类。

我们提出了一种新的多模式深度学习结构，该结构可以从文本声学数据中自动提取特征，用于句子级语音分类。首先使用两个独立的卷积神经网络结构提取文本和声学特征，然后将其组合成联合表示，最后输入决策softmax层。我们使用语音记录及其转录日志在实际医疗环境中测试了所提出的模型。我们的模型在检测6种不同意图方面实现了83.10%的平均准确率。我们还发现，我们的模型使用自动提取的特征进行意图分类，优于使用制造特征的现有模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Advances in artificial intelligence. Canadian Society for Computational Studies of Intelligence. Conference

自引率

0.00%

发文量