短文本连贯主题发现的多层次聚类模型

Emmanuel Maithya, L. Nderu, D. Njagi
{"title":"短文本连贯主题发现的多层次聚类模型","authors":"Emmanuel Maithya, L. Nderu, D. Njagi","doi":"10.23919/IST-Africa56635.2022.9845648","DOIUrl":null,"url":null,"abstract":"Deducing meaning from collections of documents has become an increasingly important task for decision makers, both in industry and academia. To address this challenge, topic modelling techniques have been developed to identify and isolate words that most closely summarise the contents of document collections. However, the topics extracted from collections of short texts by these techniques, achieve low coherence scores, thereby defeating the purpose for which these techniques were created. In this paper, we propose the n-gram_cluster model, a model that exploits the semantic closeness between n-grams and word clusters formed from collections of the n-grams at different levels to discover topics. The model is able to discover semantically coherent topics from collections of short texts. We evaluated the performance of our model against those of three other conventional models showing that it is able to form topics that achieve comparatively higher coherence scores.","PeriodicalId":142887,"journal":{"name":"2022 IST-Africa Conference (IST-Africa)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Multilevel Clustering Model for Coherent Topic Discovery in Short Texts\",\"authors\":\"Emmanuel Maithya, L. Nderu, D. Njagi\",\"doi\":\"10.23919/IST-Africa56635.2022.9845648\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deducing meaning from collections of documents has become an increasingly important task for decision makers, both in industry and academia. To address this challenge, topic modelling techniques have been developed to identify and isolate words that most closely summarise the contents of document collections. However, the topics extracted from collections of short texts by these techniques, achieve low coherence scores, thereby defeating the purpose for which these techniques were created. In this paper, we propose the n-gram_cluster model, a model that exploits the semantic closeness between n-grams and word clusters formed from collections of the n-grams at different levels to discover topics. The model is able to discover semantically coherent topics from collections of short texts. We evaluated the performance of our model against those of three other conventional models showing that it is able to form topics that achieve comparatively higher coherence scores.\",\"PeriodicalId\":142887,\"journal\":{\"name\":\"2022 IST-Africa Conference (IST-Africa)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IST-Africa Conference (IST-Africa)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/IST-Africa56635.2022.9845648\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IST-Africa Conference (IST-Africa)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/IST-Africa56635.2022.9845648","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

从文件集合中推断意义已经成为工业界和学术界决策者日益重要的任务。为了应对这一挑战,主题建模技术已经被开发出来,用于识别和分离最能概括文档集合内容的单词。然而,通过这些技术从短文本集合中提取的主题获得了较低的连贯分数,从而违背了创建这些技术的目的。在本文中,我们提出了n-gram_cluster模型,该模型利用n-gram和由不同层次n-gram集合形成的词簇之间的语义紧密性来发现主题。该模型能够从短文本集合中发现语义一致的主题。我们将我们的模型与其他三种传统模型的性能进行了评估,表明它能够形成获得相对较高连贯分数的主题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Multilevel Clustering Model for Coherent Topic Discovery in Short Texts
Deducing meaning from collections of documents has become an increasingly important task for decision makers, both in industry and academia. To address this challenge, topic modelling techniques have been developed to identify and isolate words that most closely summarise the contents of document collections. However, the topics extracted from collections of short texts by these techniques, achieve low coherence scores, thereby defeating the purpose for which these techniques were created. In this paper, we propose the n-gram_cluster model, a model that exploits the semantic closeness between n-grams and word clusters formed from collections of the n-grams at different levels to discover topics. The model is able to discover semantically coherent topics from collections of short texts. We evaluated the performance of our model against those of three other conventional models showing that it is able to form topics that achieve comparatively higher coherence scores.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信