使用隐马尔可夫模型对连续语音进行语音转录的自动独立校准

COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings Pub Date : 1988-06-24 DOI:10.1109/COMSIG.1988.49298

J. Brummer, M.W. Boetzer

{"title":"使用隐马尔可夫模型对连续语音进行语音转录的自动独立校准","authors":"J. Brummer, M.W. Boetzer","doi":"10.1109/COMSIG.1988.49298","DOIUrl":null,"url":null,"abstract":"A way is presented to time-aligned phonetic transcriptions with an acoustic speech waveform using hidden Markov models defined by the transcriptions. Given an utterance of speech and its phonetic transcription, the algorithm will yield the starting and ending times of all the phonemes in the transcription, relative to the start of the utterance. The probabilities for the model are obtained from phoneme duration probabilities and feature probabilities for a few coarse phoneme classes. Because of the coarse classes, the method is speaker-independent. The alignment is accomplished using the Viterbi algorithm. An efficient way of implementing the Viterbi algorithm is given. By using single word transcriptions, the method can be used to detect words in continuous speech, which allows words to be searched for using only their phonetic representations. Two different hidden Markov models (HMM) were used, one with discrete observation symbols and one with continuous observation vectors. The continuous model works better, but the discrete one works faster.<<ETX>>","PeriodicalId":339020,"journal":{"name":"COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1988-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Automatic speaker independent alignment of continuous speech with its phonetic transcription using a hidden Markov model\",\"authors\":\"J. Brummer, M.W. Boetzer\",\"doi\":\"10.1109/COMSIG.1988.49298\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A way is presented to time-aligned phonetic transcriptions with an acoustic speech waveform using hidden Markov models defined by the transcriptions. Given an utterance of speech and its phonetic transcription, the algorithm will yield the starting and ending times of all the phonemes in the transcription, relative to the start of the utterance. The probabilities for the model are obtained from phoneme duration probabilities and feature probabilities for a few coarse phoneme classes. Because of the coarse classes, the method is speaker-independent. The alignment is accomplished using the Viterbi algorithm. An efficient way of implementing the Viterbi algorithm is given. By using single word transcriptions, the method can be used to detect words in continuous speech, which allows words to be searched for using only their phonetic representations. Two different hidden Markov models (HMM) were used, one with discrete observation symbols and one with continuous observation vectors. The continuous model works better, but the discrete one works faster.<<ETX>>\",\"PeriodicalId\":339020,\"journal\":{\"name\":\"COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1988-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMSIG.1988.49298\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSIG.1988.49298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文提出了一种利用语音波形的隐马尔可夫模型来实现时间对齐语音转录的方法。给定一个语音及其语音转录，该算法将产生转录中所有音素的开始和结束时间，相对于话语的开始时间。模型的概率由音素持续时间概率和几个粗音素类的特征概率得到。由于类比较粗糙，该方法与说话者无关。使用Viterbi算法完成对齐。给出了一种有效实现Viterbi算法的方法。通过使用单个单词转录，该方法可以用于检测连续语音中的单词，这允许仅使用单词的语音表示来搜索单词。使用两种不同的隐马尔可夫模型(HMM)，一种具有离散观测符号，另一种具有连续观测向量。连续模型效果更好，但离散模型效果更快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic speaker independent alignment of continuous speech with its phonetic transcription using a hidden Markov model

A way is presented to time-aligned phonetic transcriptions with an acoustic speech waveform using hidden Markov models defined by the transcriptions. Given an utterance of speech and its phonetic transcription, the algorithm will yield the starting and ending times of all the phonemes in the transcription, relative to the start of the utterance. The probabilities for the model are obtained from phoneme duration probabilities and feature probabilities for a few coarse phoneme classes. Because of the coarse classes, the method is speaker-independent. The alignment is accomplished using the Viterbi algorithm. An efficient way of implementing the Viterbi algorithm is given. By using single word transcriptions, the method can be used to detect words in continuous speech, which allows words to be searched for using only their phonetic representations. Two different hidden Markov models (HMM) were used, one with discrete observation symbols and one with continuous observation vectors. The continuous model works better, but the discrete one works faster.<>

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings

自引率

0.00%

发文量