Connected sentence recognition using diphone-like templates

ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing Pub Date : 1988-04-11 DOI:10.1109/ICASSP.1988.196621

A. Rosenberg

{"title":"Connected sentence recognition using diphone-like templates","authors":"A. Rosenberg","doi":"10.1109/ICASSP.1988.196621","DOIUrl":null,"url":null,"abstract":"A template-based connected speech recognition system which represents words as sequences of diphone-like segments has been implemented and tested on a database of 50 phonetically balanced sentences uttered 5 times by a single male talker. The sentences contain 250 words, of which, 80% are monosyllabic. The inventory of segments is divided into two principal classes, single phone segments, such as vowels, nasals, fricatives, and stop bursts, and diphone segments including consonant-vowel, vowel-consonant, and consonant-consonant combinations. Words are represented by network models whose nodes are these segments. Word models incorporate juncture branches to and from other words. 400 segments are required to represent the 250 vocabulary words. Templates representing these segments are extracted from a database of 450 training sentences uttered by the same talker. Recognition is carried out by a series of matching and search processes, successively for segments, words, word strings, and sentences. The performance obtained to data has yielded 63% correct recognition of content words and approximately 30% recognition of function words.<<ETX>>","PeriodicalId":448544,"journal":{"name":"ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1988-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.1988.196621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

A template-based connected speech recognition system which represents words as sequences of diphone-like segments has been implemented and tested on a database of 50 phonetically balanced sentences uttered 5 times by a single male talker. The sentences contain 250 words, of which, 80% are monosyllabic. The inventory of segments is divided into two principal classes, single phone segments, such as vowels, nasals, fricatives, and stop bursts, and diphone segments including consonant-vowel, vowel-consonant, and consonant-consonant combinations. Words are represented by network models whose nodes are these segments. Word models incorporate juncture branches to and from other words. 400 segments are required to represent the 250 vocabulary words. Templates representing these segments are extracted from a database of 450 training sentences uttered by the same talker. Recognition is carried out by a series of matching and search processes, successively for segments, words, word strings, and sentences. The performance obtained to data has yielded 63% correct recognition of content words and approximately 30% recognition of function words.<>

查看原文本刊更多论文

使用类似电话的模板连接句子识别

一个基于模板的连接语音识别系统将单词表示为类似电话的片段序列，并在一个由单个男性说话者说5次的50个语音平衡句子的数据库上进行了测试。这些句子包含250个单词，其中80%是单音节的。音段的清单主要分为两大类:单音段，如元音、鼻音、摩擦音和顿音;双音段，包括辅音-元音、元音-辅音和辅音-辅音组合。单词由网络模型表示，网络模型的节点就是这些片段。单词模型包含了与其他单词之间的连接分支。这250个单词需要400个片段来表示。代表这些片段的模板是从数据库中提取的，数据库中有同一说话者所说的450个训练句子。识别是通过一系列匹配和搜索过程进行的，依次为词、词、词串和句子。对数据所获得的性能，实词识别率达到63%，虚词识别率约为30%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing

自引率

0.00%

发文量