{"title":"Improving term candidates selection using terminological\n tokens","authors":"M. Vàzquez, A. Oliver","doi":"10.1075/TERM.00016.VAZ","DOIUrl":null,"url":null,"abstract":"\n The identification of reliable terms from domain-specific corpora using\n computational methods is a task that has to be validated manually by\n specialists, which is a highly time-consuming activity. To reduce this effort\n and improve term candidate selection, we implemented the Token Slot Recognition\n method, a filtering method based on terminological tokens which is used to rank\n extracted term candidates from domain-specific corpora. This paper presents the\n implementation of the term candidates filtering method we developed in\n linguistic and statistical approaches applied for automatic term extraction\n using several domain-specific corpora in different languages. We observed that\n the filtering method outperforms term candidate selection by ranking a higher\n number of terms at the top of the term candidate list than raw frequency, and\n for statistical term extraction the improvement is between 15% and 25% both in\n precision and recall. Our analyses further revealed a reduction in the number of\n term candidates to be validated manually by specialists. In conclusion, the\n number of term candidates extracted automatically from domain-specific corpora\n has been reduced significantly using the Token Slot Recognition filtering\n method, so term candidates can be easily and quickly validated by\n specialists.","PeriodicalId":162784,"journal":{"name":"Computational terminology and filtering of terminological information","volume":"264 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational terminology and filtering of terminological information","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/TERM.00016.VAZ","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
The identification of reliable terms from domain-specific corpora using
computational methods is a task that has to be validated manually by
specialists, which is a highly time-consuming activity. To reduce this effort
and improve term candidate selection, we implemented the Token Slot Recognition
method, a filtering method based on terminological tokens which is used to rank
extracted term candidates from domain-specific corpora. This paper presents the
implementation of the term candidates filtering method we developed in
linguistic and statistical approaches applied for automatic term extraction
using several domain-specific corpora in different languages. We observed that
the filtering method outperforms term candidate selection by ranking a higher
number of terms at the top of the term candidate list than raw frequency, and
for statistical term extraction the improvement is between 15% and 25% both in
precision and recall. Our analyses further revealed a reduction in the number of
term candidates to be validated manually by specialists. In conclusion, the
number of term candidates extracted automatically from domain-specific corpora
has been reduced significantly using the Token Slot Recognition filtering
method, so term candidates can be easily and quickly validated by
specialists.