TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian M. Sadler, M. Vanni, Jiawei Han
{"title":"TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering","authors":"Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian M. Sadler, M. Vanni, Jiawei Han","doi":"10.1145/3219819.3220064","DOIUrl":null,"url":null,"abstract":"Taxonomy construction is not only a fundamental task for semantic analysis of text corpora, but also an important step for applications such as information filtering, recommendation, and Web search. Existing pattern-based methods extract hypernym-hyponym term pairs and then organize these pairs into a taxonomy. However, by considering each term as an independent concept node, they overlook the topical proximity and the semantic correlations among terms. In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion. To ensure the quality of the recursive process, it consists of: (1) an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones; (2) a local embedding module for learning term embeddings that maintain strong discriminative power at different levels of the taxonomy. Our experiments on two real datasets demonstrate the effectiveness of TaxoGen compared with baseline methods.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"80","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3219819.3220064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 80

Abstract

Taxonomy construction is not only a fundamental task for semantic analysis of text corpora, but also an important step for applications such as information filtering, recommendation, and Web search. Existing pattern-based methods extract hypernym-hyponym term pairs and then organize these pairs into a taxonomy. However, by considering each term as an independent concept node, they overlook the topical proximity and the semantic correlations among terms. In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion. To ensure the quality of the recursive process, it consists of: (1) an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones; (2) a local embedding module for learning term embeddings that maintain strong discriminative power at different levels of the taxonomy. Our experiments on two real datasets demonstrate the effectiveness of TaxoGen compared with baseline methods.
TaxoGen:基于自适应词嵌入和聚类的无监督主题分类法构建
分类构建是文本语料库语义分析的基础任务,也是信息过滤、推荐、Web搜索等应用的重要步骤。现有的基于模式的方法提取上下词对,然后将这些对组织成一个分类法。然而,通过将每个术语视为一个独立的概念节点,他们忽略了术语之间的主题接近性和语义相关性。在本文中,我们提出了一种构建主题分类法的方法,其中每个节点代表一个概念主题,并被定义为语义上连贯的概念术语的集群。我们的方法TaxoGen使用术语嵌入和分层聚类以递归方式构建主题分类法。为了保证递归过程的质量,它包括:(1)自适应球形聚类模块,用于在将粗主题拆分为细粒度主题时将术语分配到适当的级别;(2)局部嵌入模块,用于学习在不同分类层次上保持较强判别能力的词嵌入。我们在两个真实数据集上的实验证明了TaxoGen与基线方法相比的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信