Topic hierarchy generation via linear discriminant projection

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval Pub Date : 2003-07-28 DOI:10.1145/860435.860531

Tao Li, Shenghuo Zhu, M. Ogihara

{"title":"Topic hierarchy generation via linear discriminant projection","authors":"Tao Li, Shenghuo Zhu, M. Ogihara","doi":"10.1145/860435.860531","DOIUrl":null,"url":null,"abstract":"Text categorization has been receiving more and more attention with the ever-increasing growth of the on-line information. Automated text categorization is generally a supervised learning problem, defined as the problem of assigning pre-defined category labels to new documents based on the likelihood suggested by labeled documents. Most studies in the area have been focused on flat classification, where the predefined categories are treated individually and separately [5]. As the available information increases, when the number of categories grows significantly large, it will become much more difficult to browse and search categories. The most successful paradigm for organizing this mass of information and making it compressible is by categorizing documents according to their topics where the topics are organized in a hierarchy of increasing specificity [3]. Hierarchical structures identify the relationships of dependence between the categories and provides a valuable information source for many problems. Recently several researchers have investigated the use of hierarchies for text classification and obtained promising results [1, 4]. However, little has been done to explore the approaches to automatically generate topic hierarchies. Most of the reported techniques have been conducted on existential hierarchically structured corpora. The aim of automatic hierarchy generation has several motivations. First, manually building hierarchies is an expensive task since it requires domain experts to evaluate the documents’ relevance to the topics. Second, existing hierarchies are optimized for human use based on “human semantics”, but not necessarily for classifier use. Automatic generated hierarchies can be incorporated into various classification methods","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/860435.860531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

Text categorization has been receiving more and more attention with the ever-increasing growth of the on-line information. Automated text categorization is generally a supervised learning problem, defined as the problem of assigning pre-defined category labels to new documents based on the likelihood suggested by labeled documents. Most studies in the area have been focused on flat classification, where the predefined categories are treated individually and separately [5]. As the available information increases, when the number of categories grows significantly large, it will become much more difficult to browse and search categories. The most successful paradigm for organizing this mass of information and making it compressible is by categorizing documents according to their topics where the topics are organized in a hierarchy of increasing specificity [3]. Hierarchical structures identify the relationships of dependence between the categories and provides a valuable information source for many problems. Recently several researchers have investigated the use of hierarchies for text classification and obtained promising results [1, 4]. However, little has been done to explore the approaches to automatically generate topic hierarchies. Most of the reported techniques have been conducted on existential hierarchically structured corpora. The aim of automatic hierarchy generation has several motivations. First, manually building hierarchies is an expensive task since it requires domain experts to evaluate the documents’ relevance to the topics. Second, existing hierarchies are optimized for human use based on “human semantics”, but not necessarily for classifier use. Automatic generated hierarchies can be incorporated into various classification methods

查看原文本刊更多论文

基于线性判别投影的主题层次生成

随着网络信息量的不断增长，文本分类越来越受到人们的重视。自动文本分类通常是一个监督学习问题，定义为根据标记文档建议的可能性将预定义的类别标签分配给新文档的问题。该领域的大多数研究都集中在平面分类上，其中预定义的类别被单独处理[5]。随着可用信息的增加，当类别的数量显著增加时，浏览和搜索类别将变得更加困难。组织大量信息并使其可压缩的最成功范例是根据主题对文档进行分类，其中主题以越来越具体的层次结构进行组织[3]。层次结构确定了类别之间的依赖关系，并为许多问题提供了有价值的信息源。最近，一些研究者研究了层次结构在文本分类中的应用，并取得了令人鼓舞的结果[1,4]。然而，关于自动生成主题层次结构的方法的研究很少。大多数已报道的技术都是在存在的层次结构语料库上进行的。自动生成层次结构的目的有几个动机。首先，手动构建层次结构是一项昂贵的任务，因为它需要领域专家评估文档与主题的相关性。其次，现有的层次结构基于“人类语义”进行了优化，以供人类使用，但不一定适合分类器使用。自动生成的层次结构可以合并到各种分类方法中

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

自引率

0.00%

发文量