Computational terminology and filtering of terminological information最新文献

筛选
英文 中文
Improving term candidates selection using terminological tokens 使用术语标记改进候选术语的选择
Computational terminology and filtering of terminological information Pub Date : 2018-05-31 DOI: 10.1075/TERM.00016.VAZ
M. Vàzquez, A. Oliver
{"title":"Improving term candidates selection using terminological\u0000 tokens","authors":"M. Vàzquez, A. Oliver","doi":"10.1075/TERM.00016.VAZ","DOIUrl":"https://doi.org/10.1075/TERM.00016.VAZ","url":null,"abstract":"\u0000 The identification of reliable terms from domain-specific corpora using\u0000 computational methods is a task that has to be validated manually by\u0000 specialists, which is a highly time-consuming activity. To reduce this effort\u0000 and improve term candidate selection, we implemented the Token Slot Recognition\u0000 method, a filtering method based on terminological tokens which is used to rank\u0000 extracted term candidates from domain-specific corpora. This paper presents the\u0000 implementation of the term candidates filtering method we developed in\u0000 linguistic and statistical approaches applied for automatic term extraction\u0000 using several domain-specific corpora in different languages. We observed that\u0000 the filtering method outperforms term candidate selection by ranking a higher\u0000 number of terms at the top of the term candidate list than raw frequency, and\u0000 for statistical term extraction the improvement is between 15% and 25% both in\u0000 precision and recall. Our analyses further revealed a reduction in the number of\u0000 term candidates to be validated manually by specialists. In conclusion, the\u0000 number of term candidates extracted automatically from domain-specific corpora\u0000 has been reduced significantly using the Token Slot Recognition filtering\u0000 method, so term candidates can be easily and quickly validated by\u0000 specialists.","PeriodicalId":162784,"journal":{"name":"Computational terminology and filtering of terminological information","volume":"264 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127544156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Distributed specificity for automatic terminology extraction 分布式专用性的自动术语提取
Computational terminology and filtering of terminological information Pub Date : 2018-05-31 DOI: 10.1075/TERM.00012.AMJ
Ehsan Amjadian, D. Inkpen, T. Paribakht, F. Faez
{"title":"Distributed specificity for automatic terminology extraction","authors":"Ehsan Amjadian, D. Inkpen, T. Paribakht, F. Faez","doi":"10.1075/TERM.00012.AMJ","DOIUrl":"https://doi.org/10.1075/TERM.00012.AMJ","url":null,"abstract":"\u0000 The present article explores two novel methods that integrate distributed representations with terminology extraction. Both methods assess the specificity of a word (unigram) to the target corpus by leveraging its distributed representation in the target domain as well as in the general domain. The first approach adopts this distributed specificity as a filter, and the second directly applies it to the corpus. The filter can be mounted on any other Automatic Terminology Extraction (ATE) method, allows merging any number of other ATE methods, and achieves remarkable results with minimal training. The direct approach does not perform as high as the filtering approach, but it reemphasizes that using distributed specificity as the words’ representation, very little data is required to train an ATE classifier. This encourages more minimally supervised ATE algorithms in the future.","PeriodicalId":162784,"journal":{"name":"Computational terminology and filtering of terminological information","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127401140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Recognition of irrelevant phrases in automatically extracted lists of domain terms 在自动提取的领域术语列表中识别不相关的短语
Computational terminology and filtering of terminological information Pub Date : 2018-05-31 DOI: 10.1075/TERM.00014.MYK
A. Mykowiecka, M. Marciniak, P. Rychlik
{"title":"Recognition of irrelevant phrases in automatically extracted lists of domain terms","authors":"A. Mykowiecka, M. Marciniak, P. Rychlik","doi":"10.1075/TERM.00014.MYK","DOIUrl":"https://doi.org/10.1075/TERM.00014.MYK","url":null,"abstract":"\u0000 In our paper, we address the problem of recognition of irrelevant phrases in terminology lists obtained with an automatic term extraction tool. We focus on identification of multi-word phrases that are general terms or discourse expressions. We defined several methods based on comparison of domain corpora and a method based on contexts of phrases identified in a large corpus of general language. The methods were tested on Polish data. We used six domain corpora and one general corpus. Two test sets were prepared to evaluate the methods. The first one consisted of many presumably irrelevant phrases, as we selected phrases which occurred in at least three domain corpora. The second set mainly consisted of domain terms, as it was composed of the top-ranked phrases automatically extracted from the analyzed domain corpora.\u0000 The results show that the task is quite hard as the inter-annotator agreement is low. Several tested methods achieved similar overall results, although the phrase ordering varied between methods. The most successful method, with a precision of about 0.75 on half of the tested list, was the context based method using a modified contextual diversity coefficient.\u0000 Although the methods were tested on Polish, they seems to be language independent.","PeriodicalId":162784,"journal":{"name":"Computational terminology and filtering of terminological information","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127370432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Clinical sublanguages 临床的子
Computational terminology and filtering of terminological information Pub Date : 2018-05-31 DOI: 10.1075/TERM.00013.GRO
L. Grön, Ann Bertels
{"title":"Clinical sublanguages","authors":"L. Grön, Ann Bertels","doi":"10.1075/TERM.00013.GRO","DOIUrl":"https://doi.org/10.1075/TERM.00013.GRO","url":null,"abstract":"\u0000 Due to its specific linguistic properties, the language found in clinical records has been characterized as a distinct sublanguage. Even within the clinical domain, though, there are major differences in language use, which has led to more fine-grained distinctions based on medical fields and document types. However, previous work has mostly neglected the influence of term variation. By contrast, we propose to integrate the potential for term variation in the characterization of clinical sublanguages. By analyzing a corpus of clinical records, we show that the different sections of these records vary systematically with regard to their lexical, terminological and semantic composition, as well as their potential for term variation. These properties have implications for automatic term recognition, as they influence the performance of frequency-based term weighting.","PeriodicalId":162784,"journal":{"name":"Computational terminology and filtering of terminological information","volume":"449 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131713310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信