如何破坏一个美好的海滩你唱平静的香

Proceedings of the 10th international conference on Intelligent user interfaces Pub Date : 2005-01-10 DOI:10.1145/1040830.1040898

H. Lieberman, A. Faaborg, Waseem Daher, J. Espinosa

{"title":"如何破坏一个美好的海滩你唱平静的香","authors":"H. Lieberman, A. Faaborg, Waseem Daher, J. Espinosa","doi":"10.1145/1040830.1040898","DOIUrl":null,"url":null,"abstract":"A principal problem in speech recognition is distinguishing between words and phrases that sound similar but have different meanings. Speech recognition programs produce a list of weighted candidate hypotheses for a given audio segment, and choose the \"best\" candidate. If the choice is incorrect, the user must invoke a correction interface that displays a list of the hypotheses and choose the desired one. The correction interface is time-consuming, and accounts for much of the frustration of today's dictation systems. Conventional dictation systems prioritize hypotheses based on language models derived from statistical techniques such as n-grams and Hidden Markov Models.We propose a supplementary method for ordering hypotheses based on Commonsense Knowledge. We filter acoustical and word-frequency hypotheses by testing their plausibility with a semantic network derived from 700,000 statements about everyday life. This often filters out possibilities that \"don't make sense\" from the user's viewpoint, and leads to improved recognition. Reducing the hypothesis space in this way also makes possible streamlined correction interfaces that improve the overall throughput of dictation systems.","PeriodicalId":376409,"journal":{"name":"Proceedings of the 10th international conference on Intelligent user interfaces","volume":"211 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"61","resultStr":"{\"title\":\"How to wreck a nice beach you sing calm incense\",\"authors\":\"H. Lieberman, A. Faaborg, Waseem Daher, J. Espinosa\",\"doi\":\"10.1145/1040830.1040898\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A principal problem in speech recognition is distinguishing between words and phrases that sound similar but have different meanings. Speech recognition programs produce a list of weighted candidate hypotheses for a given audio segment, and choose the \\\"best\\\" candidate. If the choice is incorrect, the user must invoke a correction interface that displays a list of the hypotheses and choose the desired one. The correction interface is time-consuming, and accounts for much of the frustration of today's dictation systems. Conventional dictation systems prioritize hypotheses based on language models derived from statistical techniques such as n-grams and Hidden Markov Models.We propose a supplementary method for ordering hypotheses based on Commonsense Knowledge. We filter acoustical and word-frequency hypotheses by testing their plausibility with a semantic network derived from 700,000 statements about everyday life. This often filters out possibilities that \\\"don't make sense\\\" from the user's viewpoint, and leads to improved recognition. Reducing the hypothesis space in this way also makes possible streamlined correction interfaces that improve the overall throughput of dictation systems.\",\"PeriodicalId\":376409,\"journal\":{\"name\":\"Proceedings of the 10th international conference on Intelligent user interfaces\",\"volume\":\"211 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"61\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 10th international conference on Intelligent user interfaces\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1040830.1040898\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th international conference on Intelligent user interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1040830.1040898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 61

摘要

语音识别的一个主要问题是区分听起来相似但含义不同的单词和短语。语音识别程序为给定的音频片段生成加权候选假设列表，并选择“最佳”候选。如果选择不正确，用户必须调用一个显示假设列表的修正界面，并选择所需的一个。校正界面非常耗时，也是当今听写系统令人沮丧的主要原因。传统的听写系统基于统计技术(如n-grams和隐马尔可夫模型)衍生的语言模型来优先考虑假设。提出了一种基于常识知识的假设排序补充方法。我们过滤声学和词频假设，通过测试其合理性的语义网络，从70万个日常生活的陈述。这通常会过滤掉从用户角度看“没有意义”的可能性，从而提高识别能力。以这种方式减少假设空间也使简化的校正接口成为可能，从而提高听写系统的总体吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

How to wreck a nice beach you sing calm incense

A principal problem in speech recognition is distinguishing between words and phrases that sound similar but have different meanings. Speech recognition programs produce a list of weighted candidate hypotheses for a given audio segment, and choose the "best" candidate. If the choice is incorrect, the user must invoke a correction interface that displays a list of the hypotheses and choose the desired one. The correction interface is time-consuming, and accounts for much of the frustration of today's dictation systems. Conventional dictation systems prioritize hypotheses based on language models derived from statistical techniques such as n-grams and Hidden Markov Models.We propose a supplementary method for ordering hypotheses based on Commonsense Knowledge. We filter acoustical and word-frequency hypotheses by testing their plausibility with a semantic network derived from 700,000 statements about everyday life. This often filters out possibilities that "don't make sense" from the user's viewpoint, and leads to improved recognition. Reducing the hypothesis space in this way also makes possible streamlined correction interfaces that improve the overall throughput of dictation systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 10th international conference on Intelligent user interfaces

自引率

0.00%

发文量