Automatic expansion of domain-specific lexicons by term categorization

Henri Avancini, A. Lavelli, F. Sebastiani, Roberto Zanoli
{"title":"Automatic expansion of domain-specific lexicons by term categorization","authors":"Henri Avancini, A. Lavelli, F. Sebastiani, Roberto Zanoli","doi":"10.1145/1138379.1138380","DOIUrl":null,"url":null,"abstract":"We discuss an approach to the automatic expansion of<i>domain-specific lexicons</i>, that is, to the problem ofextending, for each <i>c</i><sub><i>i</i></sub> in a predefined set<i>C</i> ={<i>c</i><sub>1</sub>,…,<i>c</i><sub><i>m</i></sub>} ofsemantic <i>domains</i>, an initial lexicon<i>L</i><sup><i>i</i></sup><sub>0</sub> into a larger lexicon<i>L</i><sup><i>i</i></sup><sub>1</sub>. Our approach relies on<i>term categorization</i>, defined as the task of labelingpreviously unlabeled terms according to a predefined set ofdomains. We approach this as a supervised learning problem in whichterm classifiers are built using the initial lexicons as trainingdata. Dually to classic text categorization tasks in whichdocuments are represented as vectors in a space of terms, werepresent terms as vectors in a space of documents. We present theresults of a number of experiments in which we use a boosting-basedlearning device for training our term classifiers. We test theeffectiveness of our method by using WordNetDomains, a well-knownlarge set of domain-specific lexicons, as a benchmark. Ourexperiments are performed using the documents in the Reuters CorpusVolume 1 as implicit representations for our terms.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Speech Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1138379.1138380","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

We discuss an approach to the automatic expansion ofdomain-specific lexicons, that is, to the problem ofextending, for each ci in a predefined setC ={c1,…,cm} ofsemantic domains, an initial lexiconLi0 into a larger lexiconLi1. Our approach relies onterm categorization, defined as the task of labelingpreviously unlabeled terms according to a predefined set ofdomains. We approach this as a supervised learning problem in whichterm classifiers are built using the initial lexicons as trainingdata. Dually to classic text categorization tasks in whichdocuments are represented as vectors in a space of terms, werepresent terms as vectors in a space of documents. We present theresults of a number of experiments in which we use a boosting-basedlearning device for training our term classifiers. We test theeffectiveness of our method by using WordNetDomains, a well-knownlarge set of domain-specific lexicons, as a benchmark. Ourexperiments are performed using the documents in the Reuters CorpusVolume 1 as implicit representations for our terms.
通过术语分类自动扩展特定于领域的词汇
我们讨论了一种领域特定词汇的自动扩展方法,即对于预定义的语义域setC ={c1,…,cm}中的每个ci,将初始lexiconLi0扩展到更大的lexiconLi1的问题。我们的方法依赖于术语分类,定义为根据预定义的域集标记以前未标记的术语的任务。我们将其作为一个监督学习问题来处理,其中术语分类器是使用初始词汇作为训练数据来构建的。与经典的文本分类任务相反,其中文档在术语空间中表示为向量,将术语表示为文档空间中的向量。我们展示了一些实验的结果,在这些实验中,我们使用基于增强的学习设备来训练我们的术语分类器。我们通过使用WordNetDomains(一个众所周知的大型领域特定词汇集)作为基准来测试我们方法的有效性。我们的实验是使用路透社语料库卷1中的文档作为我们术语的隐式表示来执行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信