Selection of best match keyword using spoken term detection for spoken document indexing

Kentaro Domoto, T. Utsuro, N. Sawada, H. Nishizaki
{"title":"Selection of best match keyword using spoken term detection for spoken document indexing","authors":"Kentaro Domoto, T. Utsuro, N. Sawada, H. Nishizaki","doi":"10.1109/APSIPA.2014.7041589","DOIUrl":null,"url":null,"abstract":"This paper presents a novel keyword selection-based spoken document-indexing framework that selects the best match keyword from query candidates using spoken term detection (STD) for spoken document retrieval. Our method comprises creating a keyword set including keywords that are likely to be in a spoken document. Next, an STD is conducted for all the keywords as query terms for STD; then, the detection result, a set of each keyword and its detection intervals in the spoken document, is obtained. For the keywords that have competitive intervals, we rank them based on the matching cost of STD and select the best one with the longest duration among competitive detections. This is the final output of STD process and serves as an index word for the spoken document. The proposed framework was evaluated on lecture speeches as spoken documents in an STD task. The results show that our framework was quite effective for preventing false detection errors and in annotating keyword indices to spoken documents.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2014.7041589","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper presents a novel keyword selection-based spoken document-indexing framework that selects the best match keyword from query candidates using spoken term detection (STD) for spoken document retrieval. Our method comprises creating a keyword set including keywords that are likely to be in a spoken document. Next, an STD is conducted for all the keywords as query terms for STD; then, the detection result, a set of each keyword and its detection intervals in the spoken document, is obtained. For the keywords that have competitive intervals, we rank them based on the matching cost of STD and select the best one with the longest duration among competitive detections. This is the final output of STD process and serves as an index word for the spoken document. The proposed framework was evaluated on lecture speeches as spoken documents in an STD task. The results show that our framework was quite effective for preventing false detection errors and in annotating keyword indices to spoken documents.
使用口语词检测为口语文档索引选择最佳匹配关键字
本文提出了一种基于关键字选择的语音文档索引框架,该框架利用语音词检测(STD)从查询候选者中选择最匹配的关键字进行语音文档检索。我们的方法包括创建一个关键字集,其中包括可能出现在口语文档中的关键字。然后,对所有作为STD查询词的关键词执行STD;然后,得到语音文档中每个关键字及其检测间隔的集合作为检测结果。对于具有竞争间隔的关键词,我们根据STD的匹配成本对它们进行排序,并在竞争检测中选择持续时间最长的最佳关键词。这是STD过程的最终输出,并作为口语文档的索引词。在STD任务中,以演讲作为口语文档对所提出的框架进行了评估。结果表明,该框架在防止误检错误和为口语文档标注关键词索引方面非常有效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信