Biomedical information retrieval across languages.

Medical informatics and the Internet in medicine Pub Date : 2007-06-01 DOI:10.1080/14639230701197587

Philipp Daumke, Kornél Markü, Michael Poprat, Stefan Schulz, Rüdiger Klar

{"title":"Biomedical information retrieval across languages.","authors":"Philipp Daumke, Kornél Markü, Michael Poprat, Stefan Schulz, Rüdiger Klar","doi":"10.1080/14639230701197587","DOIUrl":null,"url":null,"abstract":"<p><p>This work presents a new dictionary-based approach to biomedical cross-language information retrieval (CLIR) that addresses many of the general and domain-specific challenges in current CLIR research. Our method is based on a multilingual lexicon that was generated partly manually and partly automatically, and currently covers six European languages. It contains morphologically meaningful word fragments, termed subwords. Using subwords instead of entire words significantly reduces the number of lexical entries necessary to sufficiently cover a specific language and domain. Mediation between queries and documents is based on these subwords as well as on lists of word-n-grams that are generated from large monolingual corpora and constitute possible translation units. The translations are then sent to a standard Internet search engine. This process makes our approach an effective tool for searching the biomedical content of the World Wide Web in different languages. We evaluate this approach using the OHSUMED corpus, a large medical document collection, within a cross-language retrieval setting.</p>","PeriodicalId":80069,"journal":{"name":"Medical informatics and the Internet in medicine","volume":"32 2","pages":"131-47"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/14639230701197587","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical informatics and the Internet in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/14639230701197587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

This work presents a new dictionary-based approach to biomedical cross-language information retrieval (CLIR) that addresses many of the general and domain-specific challenges in current CLIR research. Our method is based on a multilingual lexicon that was generated partly manually and partly automatically, and currently covers six European languages. It contains morphologically meaningful word fragments, termed subwords. Using subwords instead of entire words significantly reduces the number of lexical entries necessary to sufficiently cover a specific language and domain. Mediation between queries and documents is based on these subwords as well as on lists of word-n-grams that are generated from large monolingual corpora and constitute possible translation units. The translations are then sent to a standard Internet search engine. This process makes our approach an effective tool for searching the biomedical content of the World Wide Web in different languages. We evaluate this approach using the OHSUMED corpus, a large medical document collection, within a cross-language retrieval setting.

查看原文本刊更多论文

跨语言生物医学信息检索。

这项工作提出了一种新的基于词典的生物医学跨语言信息检索(CLIR)方法，该方法解决了当前生物医学跨语言信息检索研究中的许多一般和特定领域的挑战。我们的方法基于一个多语言词典，该词典部分是手动生成的，部分是自动生成的，目前涵盖了六种欧洲语言。它包含词形上有意义的词片段，称为子词。使用子词而不是整个词可以显著减少充分覆盖特定语言和领域所需的词汇条目数量。查询和文档之间的中介基于这些子词以及从大型单语语料库生成的词-n-图列表，这些列表构成了可能的翻译单元。翻译后的内容被发送到一个标准的互联网搜索引擎。这个过程使我们的方法成为搜索不同语言的万维网生物医学内容的有效工具。我们使用OHSUMED语料库(一个大型医疗文档集合)在跨语言检索设置中评估这种方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical informatics and the Internet in medicine

自引率

0.00%

发文量