Improving term candidates selection using terminological tokens

M. Vàzquez, A. Oliver
{"title":"Improving term candidates selection using terminological\n tokens","authors":"M. Vàzquez, A. Oliver","doi":"10.1075/TERM.00016.VAZ","DOIUrl":null,"url":null,"abstract":"\n The identification of reliable terms from domain-specific corpora using\n computational methods is a task that has to be validated manually by\n specialists, which is a highly time-consuming activity. To reduce this effort\n and improve term candidate selection, we implemented the Token Slot Recognition\n method, a filtering method based on terminological tokens which is used to rank\n extracted term candidates from domain-specific corpora. This paper presents the\n implementation of the term candidates filtering method we developed in\n linguistic and statistical approaches applied for automatic term extraction\n using several domain-specific corpora in different languages. We observed that\n the filtering method outperforms term candidate selection by ranking a higher\n number of terms at the top of the term candidate list than raw frequency, and\n for statistical term extraction the improvement is between 15% and 25% both in\n precision and recall. Our analyses further revealed a reduction in the number of\n term candidates to be validated manually by specialists. In conclusion, the\n number of term candidates extracted automatically from domain-specific corpora\n has been reduced significantly using the Token Slot Recognition filtering\n method, so term candidates can be easily and quickly validated by\n specialists.","PeriodicalId":162784,"journal":{"name":"Computational terminology and filtering of terminological information","volume":"264 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational terminology and filtering of terminological information","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/TERM.00016.VAZ","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

The identification of reliable terms from domain-specific corpora using computational methods is a task that has to be validated manually by specialists, which is a highly time-consuming activity. To reduce this effort and improve term candidate selection, we implemented the Token Slot Recognition method, a filtering method based on terminological tokens which is used to rank extracted term candidates from domain-specific corpora. This paper presents the implementation of the term candidates filtering method we developed in linguistic and statistical approaches applied for automatic term extraction using several domain-specific corpora in different languages. We observed that the filtering method outperforms term candidate selection by ranking a higher number of terms at the top of the term candidate list than raw frequency, and for statistical term extraction the improvement is between 15% and 25% both in precision and recall. Our analyses further revealed a reduction in the number of term candidates to be validated manually by specialists. In conclusion, the number of term candidates extracted automatically from domain-specific corpora has been reduced significantly using the Token Slot Recognition filtering method, so term candidates can be easily and quickly validated by specialists.
使用术语标记改进候选术语的选择
使用计算方法从特定领域的语料库中识别可靠的术语是一项必须由专家手动验证的任务,这是一项非常耗时的活动。为了减少这种工作量并改进候选词的选择,我们实现了Token Slot识别方法,这是一种基于术语令牌的过滤方法,用于对从特定领域语料库中提取的候选词进行排序。本文介绍了我们在语言学和统计方法中开发的术语候选过滤方法的实现,该方法用于使用不同语言的几个特定领域的语料库进行自动术语提取。我们观察到,过滤方法通过在术语候选列表的顶部排名更多的术语而优于原始频率,并且对于统计术语提取,精度和召回率的提高在15%到25%之间。我们的分析进一步揭示了由专家手动验证的任期候选人数量的减少。综上所述,使用Token Slot识别过滤方法可以显著减少从特定领域语料库中自动提取的候选词数量,因此专家可以轻松快速地对候选词进行验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信