使用人工神经网络后验和图像处理技术从连续语音中检测口语术语

R. Shankar, Arpit Jain, K. Deepak, C. Vikram, S. Prasanna
{"title":"使用人工神经网络后验和图像处理技术从连续语音中检测口语术语","authors":"R. Shankar, Arpit Jain, K. Deepak, C. Vikram, S. Prasanna","doi":"10.1109/NCC.2016.7561151","DOIUrl":null,"url":null,"abstract":"The objective of current work is to demonstrate the significance of morphological image processing techniques in the spoken term detection from continuous speech. The phone posterior probabilities for the reference speech data and query word are obtained from the Hidden Markov Model (HMM)- Artificial Neural Network (ANN) based hybrid phoneme recognizer. The phone posteriors of query word and reference data are matched by using the non-segmental Dynamic Time Warping (DTW) technique. In order to make the decision about the presence or absence of a keyword in a particular reference file, image processing based approach is proposed. The DTW accumulation matrix is viewed as a gray scale image and processed using binarization and skeletonization operations. The decision about the presence of keyword is taken by observing a diagonal streak of dark patch in the processed image. The phoneme recognizer is trained on the TIMIT training set and a set of twenty randomly chosen words from the TIMIT test data are considered as keywords. The algorithm is evaluated for each keyword against the entire TIMIT test data as the reference and an accuracy of about 85% with an error rate of less than 8% is noted.","PeriodicalId":279637,"journal":{"name":"2016 Twenty Second National Conference on Communication (NCC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Spoken term detection from continuous speech using ANN posteriors and image processing techniques\",\"authors\":\"R. Shankar, Arpit Jain, K. Deepak, C. Vikram, S. Prasanna\",\"doi\":\"10.1109/NCC.2016.7561151\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of current work is to demonstrate the significance of morphological image processing techniques in the spoken term detection from continuous speech. The phone posterior probabilities for the reference speech data and query word are obtained from the Hidden Markov Model (HMM)- Artificial Neural Network (ANN) based hybrid phoneme recognizer. The phone posteriors of query word and reference data are matched by using the non-segmental Dynamic Time Warping (DTW) technique. In order to make the decision about the presence or absence of a keyword in a particular reference file, image processing based approach is proposed. The DTW accumulation matrix is viewed as a gray scale image and processed using binarization and skeletonization operations. The decision about the presence of keyword is taken by observing a diagonal streak of dark patch in the processed image. The phoneme recognizer is trained on the TIMIT training set and a set of twenty randomly chosen words from the TIMIT test data are considered as keywords. The algorithm is evaluated for each keyword against the entire TIMIT test data as the reference and an accuracy of about 85% with an error rate of less than 8% is noted.\",\"PeriodicalId\":279637,\"journal\":{\"name\":\"2016 Twenty Second National Conference on Communication (NCC)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Twenty Second National Conference on Communication (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCC.2016.7561151\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Twenty Second National Conference on Communication (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2016.7561151","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

本文的目的是为了证明形态学图像处理技术在连续语音语音词汇检测中的重要意义。基于隐马尔可夫模型(HMM)-人工神经网络(ANN)的混合音素识别器得到参考语音数据和查询词的电话后验概率。采用非分段动态时间翘曲(DTW)技术对查询词和参考数据的电话后验进行匹配。为了判断特定参考文件中是否存在关键字,提出了一种基于图像处理的方法。DTW累积矩阵被视为灰度图像,并使用二值化和骨架化操作进行处理。通过观察处理后图像中的暗斑对角线条纹来判断关键词是否存在。音素识别器在TIMIT训练集上进行训练,从TIMIT测试数据中随机选择20个单词作为关键字。以整个TIMIT测试数据为参考,对每个关键字进行算法评估,准确率约为85%,错误率小于8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Spoken term detection from continuous speech using ANN posteriors and image processing techniques
The objective of current work is to demonstrate the significance of morphological image processing techniques in the spoken term detection from continuous speech. The phone posterior probabilities for the reference speech data and query word are obtained from the Hidden Markov Model (HMM)- Artificial Neural Network (ANN) based hybrid phoneme recognizer. The phone posteriors of query word and reference data are matched by using the non-segmental Dynamic Time Warping (DTW) technique. In order to make the decision about the presence or absence of a keyword in a particular reference file, image processing based approach is proposed. The DTW accumulation matrix is viewed as a gray scale image and processed using binarization and skeletonization operations. The decision about the presence of keyword is taken by observing a diagonal streak of dark patch in the processed image. The phoneme recognizer is trained on the TIMIT training set and a set of twenty randomly chosen words from the TIMIT test data are considered as keywords. The algorithm is evaluated for each keyword against the entire TIMIT test data as the reference and an accuracy of about 85% with an error rate of less than 8% is noted.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信