面向口语文档检索的鲁棒方法

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI:10.21437/ICSLP.1998-480

Kenney Ng

{"title":"面向口语文档检索的鲁棒方法","authors":"Kenney Ng","doi":"10.21437/ICSLP.1998-480","DOIUrl":null,"url":null,"abstract":"In this paper, we investigate a number of robust indexing and retrieval methods in an effort to improve spoken document retrieval performance in the presence of speech recognition errors. In particular, we examine expanding the original query representation to include confusible terms; developing a new document-query retrieval measure based on approximate matching that is less sensitive to recognition errors; expanding the document representation to include multiple recognition hypotheses; modifying the original query using automatic relevance feedback to include new terms found in the top ranked documents; and combining information from multiple subword unit representations. We study the different methods individually and then explore the effects of combining them. Experiments on radio broadcast news data show that using a combination of these methods can improve retrieval performance by over 20%.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"Towards robust methods for spoken document retrieval\",\"authors\":\"Kenney Ng\",\"doi\":\"10.21437/ICSLP.1998-480\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we investigate a number of robust indexing and retrieval methods in an effort to improve spoken document retrieval performance in the presence of speech recognition errors. In particular, we examine expanding the original query representation to include confusible terms; developing a new document-query retrieval measure based on approximate matching that is less sensitive to recognition errors; expanding the document representation to include multiple recognition hypotheses; modifying the original query using automatic relevance feedback to include new terms found in the top ranked documents; and combining information from multiple subword unit representations. We study the different methods individually and then explore the effects of combining them. Experiments on radio broadcast news data show that using a combination of these methods can improve retrieval performance by over 20%.\",\"PeriodicalId\":117113,\"journal\":{\"name\":\"5th International Conference on Spoken Language Processing (ICSLP 1998)\",\"volume\":\"92 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-11-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"5th International Conference on Spoken Language Processing (ICSLP 1998)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/ICSLP.1998-480\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th International Conference on Spoken Language Processing (ICSLP 1998)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/ICSLP.1998-480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 43

摘要

在本文中，我们研究了一些鲁棒索引和检索方法，以努力提高存在语音识别错误的语音文档检索性能。特别地，我们研究了扩展原始查询表示以包含易混淆的术语;开发一种对识别错误不太敏感的基于近似匹配的文档查询检索方法;扩展文档表示以包含多个识别假设;使用自动相关性反馈修改原始查询，以包含在排名靠前的文档中发现的新术语;并结合来自多个子词单位表示的信息。我们分别研究了不同的方法，然后探讨了将它们结合起来的效果。在广播新闻数据上的实验表明，结合使用这些方法可以使检索性能提高20%以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards robust methods for spoken document retrieval

In this paper, we investigate a number of robust indexing and retrieval methods in an effort to improve spoken document retrieval performance in the presence of speech recognition errors. In particular, we examine expanding the original query representation to include confusible terms; developing a new document-query retrieval measure based on approximate matching that is less sensitive to recognition errors; expanding the document representation to include multiple recognition hypotheses; modifying the original query using automatic relevance feedback to include new terms found in the top ranked documents; and combining information from multiple subword unit representations. We study the different methods individually and then explore the effects of combining them. Experiments on radio broadcast news data show that using a combination of these methods can improve retrieval performance by over 20%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

5th International Conference on Spoken Language Processing (ICSLP 1998)

自引率

0.00%

发文量