Henri Avancini, A. Lavelli, F. Sebastiani, Roberto Zanoli
{"title":"Automatic expansion of domain-specific lexicons by term categorization","authors":"Henri Avancini, A. Lavelli, F. Sebastiani, Roberto Zanoli","doi":"10.1145/1138379.1138380","DOIUrl":null,"url":null,"abstract":"We discuss an approach to the automatic expansion of<i>domain-specific lexicons</i>, that is, to the problem ofextending, for each <i>c</i><sub><i>i</i></sub> in a predefined set<i>C</i> ={<i>c</i><sub>1</sub>,…,<i>c</i><sub><i>m</i></sub>} ofsemantic <i>domains</i>, an initial lexicon<i>L</i><sup><i>i</i></sup><sub>0</sub> into a larger lexicon<i>L</i><sup><i>i</i></sup><sub>1</sub>. Our approach relies on<i>term categorization</i>, defined as the task of labelingpreviously unlabeled terms according to a predefined set ofdomains. We approach this as a supervised learning problem in whichterm classifiers are built using the initial lexicons as trainingdata. Dually to classic text categorization tasks in whichdocuments are represented as vectors in a space of terms, werepresent terms as vectors in a space of documents. We present theresults of a number of experiments in which we use a boosting-basedlearning device for training our term classifiers. We test theeffectiveness of our method by using WordNetDomains, a well-knownlarge set of domain-specific lexicons, as a benchmark. Ourexperiments are performed using the documents in the Reuters CorpusVolume 1 as implicit representations for our terms.","PeriodicalId":412532,"journal":{"name":"ACM Trans. Speech Lang. Process.","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Speech Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1138379.1138380","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
We discuss an approach to the automatic expansion ofdomain-specific lexicons, that is, to the problem ofextending, for each ci in a predefined setC ={c1,…,cm} ofsemantic domains, an initial lexiconLi0 into a larger lexiconLi1. Our approach relies onterm categorization, defined as the task of labelingpreviously unlabeled terms according to a predefined set ofdomains. We approach this as a supervised learning problem in whichterm classifiers are built using the initial lexicons as trainingdata. Dually to classic text categorization tasks in whichdocuments are represented as vectors in a space of terms, werepresent terms as vectors in a space of documents. We present theresults of a number of experiments in which we use a boosting-basedlearning device for training our term classifiers. We test theeffectiveness of our method by using WordNetDomains, a well-knownlarge set of domain-specific lexicons, as a benchmark. Ourexperiments are performed using the documents in the Reuters CorpusVolume 1 as implicit representations for our terms.