Improving term candidates selection using terminological tokens

Computational terminology and filtering of terminological information Pub Date : 2018-05-31 DOI:10.1075/TERM.00016.VAZ

M. Vàzquez, A. Oliver

{"title":"Improving term candidates selection using terminological\n tokens","authors":"M. Vàzquez, A. Oliver","doi":"10.1075/TERM.00016.VAZ","DOIUrl":null,"url":null,"abstract":"\n The identification of reliable terms from domain-specific corpora using\n computational methods is a task that has to be validated manually by\n specialists, which is a highly time-consuming activity. To reduce this effort\n and improve term candidate selection, we implemented the Token Slot Recognition\n method, a filtering method based on terminological tokens which is used to rank\n extracted term candidates from domain-specific corpora. This paper presents the\n implementation of the term candidates filtering method we developed in\n linguistic and statistical approaches applied for automatic term extraction\n using several domain-specific corpora in different languages. We observed that\n the filtering method outperforms term candidate selection by ranking a higher\n number of terms at the top of the term candidate list than raw frequency, and\n for statistical term extraction the improvement is between 15% and 25% both in\n precision and recall. Our analyses further revealed a reduction in the number of\n term candidates to be validated manually by specialists. In conclusion, the\n number of term candidates extracted automatically from domain-specific corpora\n has been reduced significantly using the Token Slot Recognition filtering\n method, so term candidates can be easily and quickly validated by\n specialists.","PeriodicalId":162784,"journal":{"name":"Computational terminology and filtering of terminological information","volume":"264 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational terminology and filtering of terminological information","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/TERM.00016.VAZ","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

The identification of reliable terms from domain-specific corpora using computational methods is a task that has to be validated manually by specialists, which is a highly time-consuming activity. To reduce this effort and improve term candidate selection, we implemented the Token Slot Recognition method, a filtering method based on terminological tokens which is used to rank extracted term candidates from domain-specific corpora. This paper presents the implementation of the term candidates filtering method we developed in linguistic and statistical approaches applied for automatic term extraction using several domain-specific corpora in different languages. We observed that the filtering method outperforms term candidate selection by ranking a higher number of terms at the top of the term candidate list than raw frequency, and for statistical term extraction the improvement is between 15% and 25% both in precision and recall. Our analyses further revealed a reduction in the number of term candidates to be validated manually by specialists. In conclusion, the number of term candidates extracted automatically from domain-specific corpora has been reduced significantly using the Token Slot Recognition filtering method, so term candidates can be easily and quickly validated by specialists.

查看原文本刊更多论文

使用术语标记改进候选术语的选择

使用计算方法从特定领域的语料库中识别可靠的术语是一项必须由专家手动验证的任务，这是一项非常耗时的活动。为了减少这种工作量并改进候选词的选择，我们实现了Token Slot识别方法，这是一种基于术语令牌的过滤方法，用于对从特定领域语料库中提取的候选词进行排序。本文介绍了我们在语言学和统计方法中开发的术语候选过滤方法的实现，该方法用于使用不同语言的几个特定领域的语料库进行自动术语提取。我们观察到，过滤方法通过在术语候选列表的顶部排名更多的术语而优于原始频率，并且对于统计术语提取，精度和召回率的提高在15%到25%之间。我们的分析进一步揭示了由专家手动验证的任期候选人数量的减少。综上所述，使用Token Slot识别过滤方法可以显著减少从特定领域语料库中自动提取的候选词数量，因此专家可以轻松快速地对候选词进行验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computational terminology and filtering of terminological information

自引率

0.00%

发文量