An unsupervised language model adaptation based on keyword clustering and query availability estimation

A. Ito, Y. Kajiura, S. Makino, M. Suzuki
{"title":"An unsupervised language model adaptation based on keyword clustering and query availability estimation","authors":"A. Ito, Y. Kajiura, S. Makino, M. Suzuki","doi":"10.1109/ICALIP.2008.4590103","DOIUrl":null,"url":null,"abstract":"Language model adaptation using text data downloaded from the WWW is an efficient way to train a topic-specific LM. We are developing an unsupervised LM adaptation method using data in the Web. The one key point of unsupervised Web-based LM adaptation is how to select keywords to compose the search query. In this paper, we propose a new method of selecting keywords from keyword candidates, which uses a keyword clustering technique based on word similarities. The other key point is how to determine the number of downloaded pages for each query. In this paper we propose a method to estimate \"a query availability,\" which is based on a small number of downloaded Web pages. The experimental result showed that the determination of downloaded pages using the query availability was effective than the conventional methods that determined the number of pages empirically.","PeriodicalId":175885,"journal":{"name":"2008 International Conference on Audio, Language and Image Processing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Conference on Audio, Language and Image Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICALIP.2008.4590103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Language model adaptation using text data downloaded from the WWW is an efficient way to train a topic-specific LM. We are developing an unsupervised LM adaptation method using data in the Web. The one key point of unsupervised Web-based LM adaptation is how to select keywords to compose the search query. In this paper, we propose a new method of selecting keywords from keyword candidates, which uses a keyword clustering technique based on word similarities. The other key point is how to determine the number of downloaded pages for each query. In this paper we propose a method to estimate "a query availability," which is based on a small number of downloaded Web pages. The experimental result showed that the determination of downloaded pages using the query availability was effective than the conventional methods that determined the number of pages empirically.
基于关键词聚类和查询可用性估计的无监督语言模型自适应
使用从WWW下载的文本数据自适应语言模型是训练特定主题LM的有效方法。我们正在开发一种使用网络数据的无监督LM自适应方法。基于web的无监督LM自适应的一个关键问题是如何选择关键字来组成搜索查询。本文提出了一种基于词相似度的关键词聚类技术,从候选关键词中选择关键词的新方法。另一个关键点是如何确定每个查询下载的页面数量。在本文中,我们提出了一种估算“查询可用性”的方法,该方法基于少量下载的Web页面。实验结果表明,利用查询可用性确定下载页面的方法比传统的经验确定页面数量的方法更有效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信