Deep learning with maximal figure-of-merit cost to advance multi-label speech attribute detection

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI:10.1109/SLT.2016.7846308

Ivan Kukanov, Ville Hautamäki, S. Siniscalchi, Kehuang Li

引用次数: 8

Abstract

In this work, we are interested in boosting speech attribute detection by formulating it as a multi-label classification task, and deep neural networks (DNNs) are used to design speech attribute detectors. A straightforward way to tackle the speech attribute detection task is to estimate DNN parameters using the mean squared error (MSE) loss function and employ a sigmoid function in the DNN output nodes. A more principled way is nonetheless to incorporate the micro-F1 measure, which is a widely used metric in the multi-label classification, into the DNN loss function to directly improve the metric of interest at training time. Micro-F1 is not differentiable, yet we overcome such a problem by casting our task under the maximal figure-of-merit (MFoM) learning framework. The results demonstrate that our MFoM approach consistently outperforms the baseline systems.

查看原文本刊更多论文

基于最优值代价的深度学习推进多标签语音属性检测

在这项工作中，我们感兴趣的是通过将语音属性检测制定为多标签分类任务来增强语音属性检测，并使用深度神经网络(dnn)来设计语音属性检测器。解决语音属性检测任务的一种直接方法是使用均方误差(MSE)损失函数估计DNN参数，并在DNN输出节点中使用sigmoid函数。然而，一种更有原则的方法是将微f1度量(这是多标签分类中广泛使用的度量)纳入DNN损失函数中，以直接改进训练时的感兴趣度量。Micro-F1是不可微的，但我们通过将任务置于最大价值图(MFoM)学习框架下来克服这一问题。结果表明，我们的MFoM方法始终优于基线系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量