Probabilistic term variant generator for biomedical terms

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval Pub Date : 2003-07-28 DOI:10.1145/860435.860467

Yoshimasa Tsuruoka, Junichi Tsujii

引用次数: 32

Abstract

This paper presents an algorithm to generate possible variants for biomedical terms. The algorithm gives each variant its generation probability representing its plausibility, which is potentially useful for query and dictionary expansions. The probabilistic rules for generating variants are automatically learned from raw texts using an existing abbreviation extraction technique. Our method, therefore, requires no linguistic knowledge or labor-intensive natural language resource. We conducted an experiment using 83,142 MEDLINE abstracts for rule induction and 18,930 abstracts for testing. The results indicate that our method will significantly increase the number of retrieved documents for long biomedical terms.

查看原文本刊更多论文

生物医学术语的概率术语变体生成器

本文提出了一种生成生物医学术语可能变体的算法。该算法为每个变体提供了表示其合理性的生成概率，这对于查询和字典扩展可能很有用。使用现有的缩写提取技术，从原始文本中自动学习生成变体的概率规则。因此，我们的方法不需要语言知识或劳动密集型的自然语言资源。我们进行了一项实验，使用83,142篇MEDLINE摘要进行规则归纳，使用18,930篇摘要进行测试。结果表明，我们的方法将显著增加长生物医学术语的检索数量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

自引率

0.00%

发文量