{"title":"Topic hierarchy generation via linear discriminant projection","authors":"Tao Li, Shenghuo Zhu, M. Ogihara","doi":"10.1145/860435.860531","DOIUrl":null,"url":null,"abstract":"Text categorization has been receiving more and more attention with the ever-increasing growth of the on-line information. Automated text categorization is generally a supervised learning problem, defined as the problem of assigning pre-defined category labels to new documents based on the likelihood suggested by labeled documents. Most studies in the area have been focused on flat classification, where the predefined categories are treated individually and separately [5]. As the available information increases, when the number of categories grows significantly large, it will become much more difficult to browse and search categories. The most successful paradigm for organizing this mass of information and making it compressible is by categorizing documents according to their topics where the topics are organized in a hierarchy of increasing specificity [3]. Hierarchical structures identify the relationships of dependence between the categories and provides a valuable information source for many problems. Recently several researchers have investigated the use of hierarchies for text classification and obtained promising results [1, 4]. However, little has been done to explore the approaches to automatically generate topic hierarchies. Most of the reported techniques have been conducted on existential hierarchically structured corpora. The aim of automatic hierarchy generation has several motivations. First, manually building hierarchies is an expensive task since it requires domain experts to evaluate the documents’ relevance to the topics. Second, existing hierarchies are optimized for human use based on “human semantics”, but not necessarily for classifier use. Automatic generated hierarchies can be incorporated into various classification methods","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/860435.860531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18
Abstract
Text categorization has been receiving more and more attention with the ever-increasing growth of the on-line information. Automated text categorization is generally a supervised learning problem, defined as the problem of assigning pre-defined category labels to new documents based on the likelihood suggested by labeled documents. Most studies in the area have been focused on flat classification, where the predefined categories are treated individually and separately [5]. As the available information increases, when the number of categories grows significantly large, it will become much more difficult to browse and search categories. The most successful paradigm for organizing this mass of information and making it compressible is by categorizing documents according to their topics where the topics are organized in a hierarchy of increasing specificity [3]. Hierarchical structures identify the relationships of dependence between the categories and provides a valuable information source for many problems. Recently several researchers have investigated the use of hierarchies for text classification and obtained promising results [1, 4]. However, little has been done to explore the approaches to automatically generate topic hierarchies. Most of the reported techniques have been conducted on existential hierarchically structured corpora. The aim of automatic hierarchy generation has several motivations. First, manually building hierarchies is an expensive task since it requires domain experts to evaluate the documents’ relevance to the topics. Second, existing hierarchies are optimized for human use based on “human semantics”, but not necessarily for classifier use. Automatic generated hierarchies can be incorporated into various classification methods