Computational terminology and filtering of terminological information最新文献

Improving term candidates selection using terminological tokens 使用术语标记改进候选术语的选择

Computational terminology and filtering of terminological information Pub Date : 2018-05-31 DOI: 10.1075/TERM.00016.VAZ

M. Vàzquez, A. Oliver

{"title":"Improving term candidates selection using terminological\u0000 tokens","authors":"M. Vàzquez, A. Oliver","doi":"10.1075/TERM.00016.VAZ","DOIUrl":"https://doi.org/10.1075/TERM.00016.VAZ","url":null,"abstract":"\u0000 The identification of reliable terms from domain-specific corpora using\u0000 computational methods is a task that has to be validated manually by\u0000 specialists, which is a highly time-consuming activity. To reduce this effort\u0000 and improve term candidate selection, we implemented the Token Slot Recognition\u0000 method, a filtering method based on terminological tokens which is used to rank\u0000 extracted term candidates from domain-specific corpora. This paper presents the\u0000 implementation of the term candidates filtering method we developed in\u0000 linguistic and statistical approaches applied for automatic term extraction\u0000 using several domain-specific corpora in different languages. We observed that\u0000 the filtering method outperforms term candidate selection by ranking a higher\u0000 number of terms at the top of the term candidate list than raw frequency, and\u0000 for statistical term extraction the improvement is between 15% and 25% both in\u0000 precision and recall. Our analyses further revealed a reduction in the number of\u0000 term candidates to be validated manually by specialists. In conclusion, the\u0000 number of term candidates extracted automatically from domain-specific corpora\u0000 has been reduced significantly using the Token Slot Recognition filtering\u0000 method, so term candidates can be easily and quickly validated by\u0000 specialists.","PeriodicalId":162784,"journal":{"name":"Computational terminology and filtering of terminological information","volume":"264 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127544156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Distributed specificity for automatic terminology extraction 分布式专用性的自动术语提取

Computational terminology and filtering of terminological information Pub Date : 2018-05-31 DOI: 10.1075/TERM.00012.AMJ

Ehsan Amjadian, D. Inkpen, T. Paribakht, F. Faez

引用次数: 17

Recognition of irrelevant phrases in automatically extracted lists of domain terms 在自动提取的领域术语列表中识别不相关的短语

Computational terminology and filtering of terminological information Pub Date : 2018-05-31 DOI: 10.1075/TERM.00014.MYK

A. Mykowiecka, M. Marciniak, P. Rychlik

{"title":"Recognition of irrelevant phrases in automatically extracted lists of domain terms","authors":"A. Mykowiecka, M. Marciniak, P. Rychlik","doi":"10.1075/TERM.00014.MYK","DOIUrl":"https://doi.org/10.1075/TERM.00014.MYK","url":null,"abstract":"\u0000 In our paper, we address the problem of recognition of irrelevant phrases in terminology lists obtained with an automatic term extraction tool. We focus on identification of multi-word phrases that are general terms or discourse expressions. We defined several methods based on comparison of domain corpora and a method based on contexts of phrases identified in a large corpus of general language. The methods were tested on Polish data. We used six domain corpora and one general corpus. Two test sets were prepared to evaluate the methods. The first one consisted of many presumably irrelevant phrases, as we selected phrases which occurred in at least three domain corpora. The second set mainly consisted of domain terms, as it was composed of the top-ranked phrases automatically extracted from the analyzed domain corpora.\u0000 The results show that the task is quite hard as the inter-annotator agreement is low. Several tested methods achieved similar overall results, although the phrase ordering varied between methods. The most successful method, with a precision of about 0.75 on half of the tested list, was the context based method using a modified contextual diversity coefficient.\u0000 Although the methods were tested on Polish, they seems to be language independent.","PeriodicalId":162784,"journal":{"name":"Computational terminology and filtering of terminological information","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127370432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Clinical sublanguages 临床的子

Computational terminology and filtering of terminological information Pub Date : 2018-05-31 DOI: 10.1075/TERM.00013.GRO

L. Grön, Ann Bertels

引用次数: 3