VOWEL DURATION MEASUREMENT USING DEEP NEURAL NETWORKS.

IEEE International Workshop on Machine Learning for Signal Processing : [proceedings]. IEEE International Workshop on Machine Learning for Signal Processing Pub Date : 2015-09-01 Epub Date: 2015-11-12 DOI:10.1109/MLSP.2015.7324331

Yossi Adi, Joseph Keshet, Matthew Goldrick

{"title":"VOWEL DURATION MEASUREMENT USING DEEP NEURAL NETWORKS.","authors":"Yossi Adi, Joseph Keshet, Matthew Goldrick","doi":"10.1109/MLSP.2015.7324331","DOIUrl":null,"url":null,"abstract":"<p><p>Vowel durations are most often utilized in studies addressing specific issues in phonetics. Thus far this has been hampered by a reliance on subjective, labor-intensive manual annotation. Our goal is to build an algorithm for automatic accurate measurement of vowel duration, where the input to the algorithm is a speech segment contains one vowel preceded and followed by consonants (CVC). Our algorithm is based on a deep neural network trained at the frame level on manually annotated data from a phonetic study. Specifically, we try two deep-network architectures: convolutional neural network (CNN), and deep belief network (DBN), and compare their accuracy to an HMM-based forced aligner. Results suggest that CNN is better than DBN, and both CNN and HMM-based forced aligner are comparable in their results, but neither of them yielded the same predictions as models fit to manually annotated data.</p>","PeriodicalId":73290,"journal":{"name":"IEEE International Workshop on Machine Learning for Signal Processing : [proceedings]. IEEE International Workshop on Machine Learning for Signal Processing","volume":"2015 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5636193/pdf/nihms909632.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Workshop on Machine Learning for Signal Processing : [proceedings]. IEEE International Workshop on Machine Learning for Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MLSP.2015.7324331","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2015/11/12 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Vowel durations are most often utilized in studies addressing specific issues in phonetics. Thus far this has been hampered by a reliance on subjective, labor-intensive manual annotation. Our goal is to build an algorithm for automatic accurate measurement of vowel duration, where the input to the algorithm is a speech segment contains one vowel preceded and followed by consonants (CVC). Our algorithm is based on a deep neural network trained at the frame level on manually annotated data from a phonetic study. Specifically, we try two deep-network architectures: convolutional neural network (CNN), and deep belief network (DBN), and compare their accuracy to an HMM-based forced aligner. Results suggest that CNN is better than DBN, and both CNN and HMM-based forced aligner are comparable in their results, but neither of them yielded the same predictions as models fit to manually annotated data.

Abstract Image

查看原文本刊更多论文

利用深度神经网络测量元音持续时间

元音持续时间最常被用于解决语音学特定问题的研究中。迄今为止，这一直受到依赖主观、劳动密集型人工标注的阻碍。我们的目标是建立一种自动精确测量元音持续时间的算法，该算法的输入是包含一个元音在前和辅音在后的语音片段（CVC）。我们的算法基于一个深度神经网络，该网络在语音研究的人工标注数据基础上进行帧级训练。具体来说，我们尝试了两种深度网络架构：卷积神经网络（CNN）和深度信念网络（DBN），并将它们的准确性与基于 HMM 的强制对齐器进行了比较。结果表明，CNN 优于 DBN，CNN 和基于 HMM 的强制对齐器在结果上不相上下，但两者的预测结果都不如适合人工标注数据的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE International Workshop on Machine Learning for Signal Processing : [proceedings]. IEEE International Workshop on Machine Learning for Signal Processing

自引率

0.00%

发文量