{"title":"使用隐马尔可夫模型对连续语音进行语音转录的自动独立校准","authors":"J. Brummer, M.W. Boetzer","doi":"10.1109/COMSIG.1988.49298","DOIUrl":null,"url":null,"abstract":"A way is presented to time-aligned phonetic transcriptions with an acoustic speech waveform using hidden Markov models defined by the transcriptions. Given an utterance of speech and its phonetic transcription, the algorithm will yield the starting and ending times of all the phonemes in the transcription, relative to the start of the utterance. The probabilities for the model are obtained from phoneme duration probabilities and feature probabilities for a few coarse phoneme classes. Because of the coarse classes, the method is speaker-independent. The alignment is accomplished using the Viterbi algorithm. An efficient way of implementing the Viterbi algorithm is given. By using single word transcriptions, the method can be used to detect words in continuous speech, which allows words to be searched for using only their phonetic representations. Two different hidden Markov models (HMM) were used, one with discrete observation symbols and one with continuous observation vectors. The continuous model works better, but the discrete one works faster.<<ETX>>","PeriodicalId":339020,"journal":{"name":"COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1988-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Automatic speaker independent alignment of continuous speech with its phonetic transcription using a hidden Markov model\",\"authors\":\"J. Brummer, M.W. Boetzer\",\"doi\":\"10.1109/COMSIG.1988.49298\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A way is presented to time-aligned phonetic transcriptions with an acoustic speech waveform using hidden Markov models defined by the transcriptions. Given an utterance of speech and its phonetic transcription, the algorithm will yield the starting and ending times of all the phonemes in the transcription, relative to the start of the utterance. The probabilities for the model are obtained from phoneme duration probabilities and feature probabilities for a few coarse phoneme classes. Because of the coarse classes, the method is speaker-independent. The alignment is accomplished using the Viterbi algorithm. An efficient way of implementing the Viterbi algorithm is given. By using single word transcriptions, the method can be used to detect words in continuous speech, which allows words to be searched for using only their phonetic representations. Two different hidden Markov models (HMM) were used, one with discrete observation symbols and one with continuous observation vectors. The continuous model works better, but the discrete one works faster.<<ETX>>\",\"PeriodicalId\":339020,\"journal\":{\"name\":\"COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1988-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMSIG.1988.49298\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"COMSIG 88@m_Southern African Conference on Communications and Signal Processing. Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMSIG.1988.49298","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic speaker independent alignment of continuous speech with its phonetic transcription using a hidden Markov model
A way is presented to time-aligned phonetic transcriptions with an acoustic speech waveform using hidden Markov models defined by the transcriptions. Given an utterance of speech and its phonetic transcription, the algorithm will yield the starting and ending times of all the phonemes in the transcription, relative to the start of the utterance. The probabilities for the model are obtained from phoneme duration probabilities and feature probabilities for a few coarse phoneme classes. Because of the coarse classes, the method is speaker-independent. The alignment is accomplished using the Viterbi algorithm. An efficient way of implementing the Viterbi algorithm is given. By using single word transcriptions, the method can be used to detect words in continuous speech, which allows words to be searched for using only their phonetic representations. Two different hidden Markov models (HMM) were used, one with discrete observation symbols and one with continuous observation vectors. The continuous model works better, but the discrete one works faster.<>