Enhancing query expansion for semantic retrieval of spoken content with automatically discovered acoustic patterns

2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-10-21 DOI:10.1109/ICASSP.2013.6639283

Hung-yi Lee, Yun-Chiao Li, Cheng-Tao Chung, Lin-Shan Lee

{"title":"Enhancing query expansion for semantic retrieval of spoken content with automatically discovered acoustic patterns","authors":"Hung-yi Lee, Yun-Chiao Li, Cheng-Tao Chung, Lin-Shan Lee","doi":"10.1109/ICASSP.2013.6639283","DOIUrl":null,"url":null,"abstract":"Query expansion techniques were originally developed for text information retrieval in order to retrieve the documents not containing the query terms but semantically related to the query. This is achieved by assuming the terms frequently occurring in the top-ranked documents in the first-pass retrieval results to be query-related and using them to expand the query to do the second-pass retrieval. However, when this approach was used for spoken content retrieval, the inevitable recognition errors and the OOV problems in ASR make it difficult for many query-related terms to be included in the expanded query, and much of the information carried by the speech signal is lost during recognition and not recoverable. In this paper, we propose to use a second ASR engine based on acoustic patterns automatically discovered from the spoken archive used for retrieval. These acoustic patterns are discovered directly based on the signal characteristics, and therefore can compensate for the information lost during recognition to a good extent. When a text query is entered, the system generates the first-pass retrieval results based on the transcriptions of the spoken segments obtained via the conventional ASR. The acoustic patterns frequently occurring in the spoken segments ranked on top of the first-pass results are considered as query-related, and the spoken segments containing these query-related acoustic patterns are retrieved. In this way, even though some query-related terms are OOV or incorrectly recognized, the segments including these terms can still be retrieved by acoustic patterns corresponding to these terms. Preliminary experiments performed on Mandarin broadcast news offered very encouraging results.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2013.6639283","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Query expansion techniques were originally developed for text information retrieval in order to retrieve the documents not containing the query terms but semantically related to the query. This is achieved by assuming the terms frequently occurring in the top-ranked documents in the first-pass retrieval results to be query-related and using them to expand the query to do the second-pass retrieval. However, when this approach was used for spoken content retrieval, the inevitable recognition errors and the OOV problems in ASR make it difficult for many query-related terms to be included in the expanded query, and much of the information carried by the speech signal is lost during recognition and not recoverable. In this paper, we propose to use a second ASR engine based on acoustic patterns automatically discovered from the spoken archive used for retrieval. These acoustic patterns are discovered directly based on the signal characteristics, and therefore can compensate for the information lost during recognition to a good extent. When a text query is entered, the system generates the first-pass retrieval results based on the transcriptions of the spoken segments obtained via the conventional ASR. The acoustic patterns frequently occurring in the spoken segments ranked on top of the first-pass results are considered as query-related, and the spoken segments containing these query-related acoustic patterns are retrieved. In this way, even though some query-related terms are OOV or incorrectly recognized, the segments including these terms can still be retrieved by acoustic patterns corresponding to these terms. Preliminary experiments performed on Mandarin broadcast news offered very encouraging results.

查看原文本刊更多论文

通过自动发现声学模式，增强口语内容语义检索的查询扩展

查询扩展技术最初是为文本信息检索而开发的，目的是检索不包含查询项但在语义上与查询相关的文档。这是通过假设第一遍检索结果中排名靠前的文档中经常出现的术语与查询相关，并使用它们扩展查询以执行第二遍检索来实现的。然而，当将该方法用于语音内容检索时，ASR中不可避免的识别错误和OOV问题使得许多与查询相关的术语难以包含在扩展的查询中，并且语音信号所携带的许多信息在识别过程中丢失并且无法恢复。在本文中，我们建议使用第二种基于声学模式的ASR引擎，该引擎自动从用于检索的语音档案中发现声学模式。这些声学模式是根据信号特征直接发现的，因此可以很好地补偿识别过程中丢失的信息。当输入文本查询时，系统根据通过常规ASR获得的语音片段的转录生成第一遍检索结果。在第一遍结果中排名靠前的语音片段中频繁出现的声学模式被认为是查询相关的，并且检索包含这些查询相关声学模式的语音片段。这样，即使一些与查询相关的术语是OOV的或被错误识别的，包含这些术语的片段仍然可以通过与这些术语对应的声学模式来检索。对普通话广播新闻进行的初步实验提供了非常令人鼓舞的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

自引率

0.00%

发文量