基于线性判别投影的主题层次生成

Tao Li, Shenghuo Zhu, M. Ogihara
{"title":"基于线性判别投影的主题层次生成","authors":"Tao Li, Shenghuo Zhu, M. Ogihara","doi":"10.1145/860435.860531","DOIUrl":null,"url":null,"abstract":"Text categorization has been receiving more and more attention with the ever-increasing growth of the on-line information. Automated text categorization is generally a supervised learning problem, defined as the problem of assigning pre-defined category labels to new documents based on the likelihood suggested by labeled documents. Most studies in the area have been focused on flat classification, where the predefined categories are treated individually and separately [5]. As the available information increases, when the number of categories grows significantly large, it will become much more difficult to browse and search categories. The most successful paradigm for organizing this mass of information and making it compressible is by categorizing documents according to their topics where the topics are organized in a hierarchy of increasing specificity [3]. Hierarchical structures identify the relationships of dependence between the categories and provides a valuable information source for many problems. Recently several researchers have investigated the use of hierarchies for text classification and obtained promising results [1, 4]. However, little has been done to explore the approaches to automatically generate topic hierarchies. Most of the reported techniques have been conducted on existential hierarchically structured corpora. The aim of automatic hierarchy generation has several motivations. First, manually building hierarchies is an expensive task since it requires domain experts to evaluate the documents’ relevance to the topics. Second, existing hierarchies are optimized for human use based on “human semantics”, but not necessarily for classifier use. Automatic generated hierarchies can be incorporated into various classification methods","PeriodicalId":209809,"journal":{"name":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Topic hierarchy generation via linear discriminant projection\",\"authors\":\"Tao Li, Shenghuo Zhu, M. Ogihara\",\"doi\":\"10.1145/860435.860531\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text categorization has been receiving more and more attention with the ever-increasing growth of the on-line information. Automated text categorization is generally a supervised learning problem, defined as the problem of assigning pre-defined category labels to new documents based on the likelihood suggested by labeled documents. Most studies in the area have been focused on flat classification, where the predefined categories are treated individually and separately [5]. As the available information increases, when the number of categories grows significantly large, it will become much more difficult to browse and search categories. The most successful paradigm for organizing this mass of information and making it compressible is by categorizing documents according to their topics where the topics are organized in a hierarchy of increasing specificity [3]. Hierarchical structures identify the relationships of dependence between the categories and provides a valuable information source for many problems. Recently several researchers have investigated the use of hierarchies for text classification and obtained promising results [1, 4]. However, little has been done to explore the approaches to automatically generate topic hierarchies. Most of the reported techniques have been conducted on existential hierarchically structured corpora. The aim of automatic hierarchy generation has several motivations. First, manually building hierarchies is an expensive task since it requires domain experts to evaluate the documents’ relevance to the topics. Second, existing hierarchies are optimized for human use based on “human semantics”, but not necessarily for classifier use. Automatic generated hierarchies can be incorporated into various classification methods\",\"PeriodicalId\":209809,\"journal\":{\"name\":\"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval\",\"volume\":\"118 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/860435.860531\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/860435.860531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

摘要

随着网络信息量的不断增长,文本分类越来越受到人们的重视。自动文本分类通常是一个监督学习问题,定义为根据标记文档建议的可能性将预定义的类别标签分配给新文档的问题。该领域的大多数研究都集中在平面分类上,其中预定义的类别被单独处理[5]。随着可用信息的增加,当类别的数量显著增加时,浏览和搜索类别将变得更加困难。组织大量信息并使其可压缩的最成功范例是根据主题对文档进行分类,其中主题以越来越具体的层次结构进行组织[3]。层次结构确定了类别之间的依赖关系,并为许多问题提供了有价值的信息源。最近,一些研究者研究了层次结构在文本分类中的应用,并取得了令人鼓舞的结果[1,4]。然而,关于自动生成主题层次结构的方法的研究很少。大多数已报道的技术都是在存在的层次结构语料库上进行的。自动生成层次结构的目的有几个动机。首先,手动构建层次结构是一项昂贵的任务,因为它需要领域专家评估文档与主题的相关性。其次,现有的层次结构基于“人类语义”进行了优化,以供人类使用,但不一定适合分类器使用。自动生成的层次结构可以合并到各种分类方法中
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Topic hierarchy generation via linear discriminant projection
Text categorization has been receiving more and more attention with the ever-increasing growth of the on-line information. Automated text categorization is generally a supervised learning problem, defined as the problem of assigning pre-defined category labels to new documents based on the likelihood suggested by labeled documents. Most studies in the area have been focused on flat classification, where the predefined categories are treated individually and separately [5]. As the available information increases, when the number of categories grows significantly large, it will become much more difficult to browse and search categories. The most successful paradigm for organizing this mass of information and making it compressible is by categorizing documents according to their topics where the topics are organized in a hierarchy of increasing specificity [3]. Hierarchical structures identify the relationships of dependence between the categories and provides a valuable information source for many problems. Recently several researchers have investigated the use of hierarchies for text classification and obtained promising results [1, 4]. However, little has been done to explore the approaches to automatically generate topic hierarchies. Most of the reported techniques have been conducted on existential hierarchically structured corpora. The aim of automatic hierarchy generation has several motivations. First, manually building hierarchies is an expensive task since it requires domain experts to evaluate the documents’ relevance to the topics. Second, existing hierarchies are optimized for human use based on “human semantics”, but not necessarily for classifier use. Automatic generated hierarchies can be incorporated into various classification methods
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信