MHeTRep: A multilingual semantically tagged health terms repository

IF 1.9 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Natural Language Engineering Pub Date : 2022-02-25 DOI:10.1017/s1351324922000055

J. Vivaldi, H. Rodríguez

{"title":"MHeTRep: A multilingual semantically tagged health terms repository","authors":"J. Vivaldi, H. Rodríguez","doi":"10.1017/s1351324922000055","DOIUrl":null,"url":null,"abstract":"Abstract This paper presents MHeTRep, a multilingual medical terminology and the methodology followed for its compilation. The multilingual terminology is organised into one vocabulary for each language. All the terms in the collection are semantically tagged with a tagset corresponding to the top categories of Snomed-CT ontology. When possible, the individual terms are linked to their equivalent in the other languages. Even though many NLP resources and tools claim to be domain independent, their application to specific tasks can be restricted to specific domains, otherwise their performance degrades notably. As the accuracy of NLP resources drops heavily when applied in environments different from which they were built, a tuning to the new environment is needed. Usually, having a domain terminology facilitates and accelerates the adaptation of general domain NLP applications to a new domain. This is particularly important in medicine, a domain living moments of great expansion. The proposed method takes Snomed-CT as starting point. From this point and using 13 multilingual resources, covering the most relevant medical concepts such as drugs, anatomy, clinical findings and procedures, we built a large resource covering seven languages totalling more than two million semantically tagged terms. The resulting collection has been intensively evaluated in several ways for the involved languages and domain categories. Our hypothesis is that MHeTRep can be used advantageously over the original resources for a number of NLP use cases and likely extended to other languages.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":"29 1","pages":"1364 - 1401"},"PeriodicalIF":1.9000,"publicationDate":"2022-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1017/s1351324922000055","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract This paper presents MHeTRep, a multilingual medical terminology and the methodology followed for its compilation. The multilingual terminology is organised into one vocabulary for each language. All the terms in the collection are semantically tagged with a tagset corresponding to the top categories of Snomed-CT ontology. When possible, the individual terms are linked to their equivalent in the other languages. Even though many NLP resources and tools claim to be domain independent, their application to specific tasks can be restricted to specific domains, otherwise their performance degrades notably. As the accuracy of NLP resources drops heavily when applied in environments different from which they were built, a tuning to the new environment is needed. Usually, having a domain terminology facilitates and accelerates the adaptation of general domain NLP applications to a new domain. This is particularly important in medicine, a domain living moments of great expansion. The proposed method takes Snomed-CT as starting point. From this point and using 13 multilingual resources, covering the most relevant medical concepts such as drugs, anatomy, clinical findings and procedures, we built a large resource covering seven languages totalling more than two million semantically tagged terms. The resulting collection has been intensively evaluated in several ways for the involved languages and domain categories. Our hypothesis is that MHeTRep can be used advantageously over the original resources for a number of NLP use cases and likely extended to other languages.

查看原文本刊更多论文

MHeTRep:多语言语义标记的运行状况术语存储库

摘要本文介绍了MHeTRep，一个多语言医学术语和方法遵循其编译。多语言术语为每种语言组织成一个词汇表。集合中的所有术语都使用与Snomed-CT本体的顶级类别相对应的标记集进行语义标记。在可能的情况下，将单个术语与其他语言中的对应术语链接起来。尽管许多NLP资源和工具声称是领域独立的，但它们对特定任务的应用可以限制在特定的领域，否则它们的性能会显著下降。由于NLP资源在不同于其构建环境的环境中应用时准确性会严重下降，因此需要对新环境进行调优。通常，拥有一个领域术语可以促进和加速一般领域NLP应用程序对新领域的适应。这在医学领域尤其重要，因为医学领域正处于巨大的扩张时期。该方法以Snomed-CT为起始点。从这一点出发，使用13种多语言资源，涵盖最相关的医学概念，如药物，解剖学，临床发现和程序，我们建立了一个涵盖七种语言的大型资源，总计超过200万个语义标记术语。对于所涉及的语言和领域类别，结果集已经以几种方式进行了深入评估。我们的假设是，在许多NLP用例中，MHeTRep可以比原始资源更有利地使用，并可能扩展到其他语言。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Natural Language Engineering COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

5.90

自引率

12.00%

发文量

审稿时长

>12 weeks

期刊介绍： Natural Language Engineering meets the needs of professionals and researchers working in all areas of computerised language processing, whether from the perspective of theoretical or descriptive linguistics, lexicology, computer science or engineering. Its aim is to bridge the gap between traditional computational linguistics research and the implementation of practical applications with potential real-world use. As well as publishing research articles on a broad range of topics - from text analysis, machine translation, information retrieval and speech analysis and generation to integrated systems and multi modal interfaces - it also publishes special issues on specific areas and technologies within these topics, an industry watch column and book reviews.