Biomedical information retrieval across languages.

Philipp Daumke, Kornél Markü, Michael Poprat, Stefan Schulz, Rüdiger Klar
{"title":"Biomedical information retrieval across languages.","authors":"Philipp Daumke,&nbsp;Kornél Markü,&nbsp;Michael Poprat,&nbsp;Stefan Schulz,&nbsp;Rüdiger Klar","doi":"10.1080/14639230701197587","DOIUrl":null,"url":null,"abstract":"<p><p>This work presents a new dictionary-based approach to biomedical cross-language information retrieval (CLIR) that addresses many of the general and domain-specific challenges in current CLIR research. Our method is based on a multilingual lexicon that was generated partly manually and partly automatically, and currently covers six European languages. It contains morphologically meaningful word fragments, termed subwords. Using subwords instead of entire words significantly reduces the number of lexical entries necessary to sufficiently cover a specific language and domain. Mediation between queries and documents is based on these subwords as well as on lists of word-n-grams that are generated from large monolingual corpora and constitute possible translation units. The translations are then sent to a standard Internet search engine. This process makes our approach an effective tool for searching the biomedical content of the World Wide Web in different languages. We evaluate this approach using the OHSUMED corpus, a large medical document collection, within a cross-language retrieval setting.</p>","PeriodicalId":80069,"journal":{"name":"Medical informatics and the Internet in medicine","volume":"32 2","pages":"131-47"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/14639230701197587","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical informatics and the Internet in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/14639230701197587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

This work presents a new dictionary-based approach to biomedical cross-language information retrieval (CLIR) that addresses many of the general and domain-specific challenges in current CLIR research. Our method is based on a multilingual lexicon that was generated partly manually and partly automatically, and currently covers six European languages. It contains morphologically meaningful word fragments, termed subwords. Using subwords instead of entire words significantly reduces the number of lexical entries necessary to sufficiently cover a specific language and domain. Mediation between queries and documents is based on these subwords as well as on lists of word-n-grams that are generated from large monolingual corpora and constitute possible translation units. The translations are then sent to a standard Internet search engine. This process makes our approach an effective tool for searching the biomedical content of the World Wide Web in different languages. We evaluate this approach using the OHSUMED corpus, a large medical document collection, within a cross-language retrieval setting.

跨语言生物医学信息检索。
这项工作提出了一种新的基于词典的生物医学跨语言信息检索(CLIR)方法,该方法解决了当前生物医学跨语言信息检索研究中的许多一般和特定领域的挑战。我们的方法基于一个多语言词典,该词典部分是手动生成的,部分是自动生成的,目前涵盖了六种欧洲语言。它包含词形上有意义的词片段,称为子词。使用子词而不是整个词可以显著减少充分覆盖特定语言和领域所需的词汇条目数量。查询和文档之间的中介基于这些子词以及从大型单语语料库生成的词-n-图列表,这些列表构成了可能的翻译单元。翻译后的内容被发送到一个标准的互联网搜索引擎。这个过程使我们的方法成为搜索不同语言的万维网生物医学内容的有效工具。我们使用OHSUMED语料库(一个大型医疗文档集合)在跨语言检索设置中评估这种方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信