CC-Top:动态主题发现的约束聚类

Jann Goschenhofer, Pranav Ragupathy, C. Heumann, Bernd Bischl, M. Aßenmacher
{"title":"CC-Top:动态主题发现的约束聚类","authors":"Jann Goschenhofer, Pranav Ragupathy, C. Heumann, Bernd Bischl, M. Aßenmacher","doi":"10.18653/v1/2022.evonlp-1.5","DOIUrl":null,"url":null,"abstract":"Research on multi-class text classification of short texts mainly focuses on supervised (transfer) learning approaches, requiring a finite set of pre-defined classes which is constant over time. This work explores deep constrained clustering (CC) as an alternative to supervised learning approaches in a setting with a dynamically changing number of classes, a task we introduce as dynamic topic discovery (DTD).We do so by using pairwise similarity constraints instead of instance-level class labels which allow for a flexible number of classes while exhibiting a competitive performance compared to supervised approaches. First, we substantiate this through a series of experiments and show that CC algorithms exhibit a predictive performance similar to state-of-the-art supervised learning algorithms while requiring less annotation effort.Second, we demonstrate the overclustering capabilities of deep CC for detecting topics in short text data sets in the absence of the ground truth class cardinality during model training.Third, we showcase that these capabilities can be leveraged for the DTD setting as a step towards dynamic learning over time and finally, we release our codebase to nurture further research in this area.","PeriodicalId":158578,"journal":{"name":"Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CC-Top: Constrained Clustering for Dynamic Topic Discovery\",\"authors\":\"Jann Goschenhofer, Pranav Ragupathy, C. Heumann, Bernd Bischl, M. Aßenmacher\",\"doi\":\"10.18653/v1/2022.evonlp-1.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Research on multi-class text classification of short texts mainly focuses on supervised (transfer) learning approaches, requiring a finite set of pre-defined classes which is constant over time. This work explores deep constrained clustering (CC) as an alternative to supervised learning approaches in a setting with a dynamically changing number of classes, a task we introduce as dynamic topic discovery (DTD).We do so by using pairwise similarity constraints instead of instance-level class labels which allow for a flexible number of classes while exhibiting a competitive performance compared to supervised approaches. First, we substantiate this through a series of experiments and show that CC algorithms exhibit a predictive performance similar to state-of-the-art supervised learning algorithms while requiring less annotation effort.Second, we demonstrate the overclustering capabilities of deep CC for detecting topics in short text data sets in the absence of the ground truth class cardinality during model training.Third, we showcase that these capabilities can be leveraged for the DTD setting as a step towards dynamic learning over time and finally, we release our codebase to nurture further research in this area.\",\"PeriodicalId\":158578,\"journal\":{\"name\":\"Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2022.evonlp-1.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the The First Workshop on Ever Evolving NLP (EvoNLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2022.evonlp-1.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

短文本多类文本分类的研究主要集中在监督(迁移)学习方法上,需要有限的预定义类集,这些类集随着时间的推移是恒定的。这项工作探索了深度约束聚类(CC)作为监督学习方法的替代方法,在类数量动态变化的环境中,我们将这种任务称为动态主题发现(DTD)。我们通过使用两两相似约束而不是实例级类标签来实现这一点,这允许灵活的类数量,同时与监督方法相比表现出具有竞争力的性能。首先,我们通过一系列实验证实了这一点,并表明CC算法表现出与最先进的监督学习算法相似的预测性能,同时需要更少的注释工作。其次,我们展示了深度CC在模型训练期间缺乏基础真类基数的情况下在短文本数据集中检测主题的过度聚类能力。第三,我们展示了这些功能可以用于DTD设置,作为实现动态学习的一个步骤,最后,我们发布了我们的代码库,以促进该领域的进一步研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
CC-Top: Constrained Clustering for Dynamic Topic Discovery
Research on multi-class text classification of short texts mainly focuses on supervised (transfer) learning approaches, requiring a finite set of pre-defined classes which is constant over time. This work explores deep constrained clustering (CC) as an alternative to supervised learning approaches in a setting with a dynamically changing number of classes, a task we introduce as dynamic topic discovery (DTD).We do so by using pairwise similarity constraints instead of instance-level class labels which allow for a flexible number of classes while exhibiting a competitive performance compared to supervised approaches. First, we substantiate this through a series of experiments and show that CC algorithms exhibit a predictive performance similar to state-of-the-art supervised learning algorithms while requiring less annotation effort.Second, we demonstrate the overclustering capabilities of deep CC for detecting topics in short text data sets in the absence of the ground truth class cardinality during model training.Third, we showcase that these capabilities can be leveraged for the DTD setting as a step towards dynamic learning over time and finally, we release our codebase to nurture further research in this area.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信