LDA与NMF在大型文本流数据事件检测中的比较

Pranav Suri, N. Roy
{"title":"LDA与NMF在大型文本流数据事件检测中的比较","authors":"Pranav Suri, N. Roy","doi":"10.1109/CIACT.2017.7977281","DOIUrl":null,"url":null,"abstract":"Usage of social network for topic identification has become essential when dealing with event detection, especially when the events impact the society. In order to address this task, machine learning algorithms and natural language processing techniques have been extensively used. In this paper, an approach to obtain meaningful data from Twitter has been discussed. Further, Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) have been used in order to detect topics from this textual data obtained from Twitter along with RSS feed of news headlines. The observed results show that both the algorithms perform well in detecting topics from text streams, the results of LDA being more semantically interpretable while NMF being faster of the two.","PeriodicalId":218079,"journal":{"name":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Comparison between LDA & NMF for event-detection from large text stream data\",\"authors\":\"Pranav Suri, N. Roy\",\"doi\":\"10.1109/CIACT.2017.7977281\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Usage of social network for topic identification has become essential when dealing with event detection, especially when the events impact the society. In order to address this task, machine learning algorithms and natural language processing techniques have been extensively used. In this paper, an approach to obtain meaningful data from Twitter has been discussed. Further, Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) have been used in order to detect topics from this textual data obtained from Twitter along with RSS feed of news headlines. The observed results show that both the algorithms perform well in detecting topics from text streams, the results of LDA being more semantically interpretable while NMF being faster of the two.\",\"PeriodicalId\":218079,\"journal\":{\"name\":\"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)\",\"volume\":\"2015 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIACT.2017.7977281\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIACT.2017.7977281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

摘要

在处理事件检测时,特别是当事件对社会产生影响时,利用社交网络进行主题识别已经变得必不可少。为了解决这个问题,机器学习算法和自然语言处理技术被广泛使用。本文讨论了一种从Twitter获取有意义数据的方法。此外,使用潜在狄利克雷分配(LDA)和非负矩阵分解(NMF)从Twitter以及新闻标题的RSS提要获得的文本数据中检测主题。实验结果表明,两种算法都能很好地从文本流中检测主题,LDA算法的结果语义可解释性更好,NMF算法的结果语义可解释性更高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison between LDA & NMF for event-detection from large text stream data
Usage of social network for topic identification has become essential when dealing with event detection, especially when the events impact the society. In order to address this task, machine learning algorithms and natural language processing techniques have been extensively used. In this paper, an approach to obtain meaningful data from Twitter has been discussed. Further, Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) have been used in order to detect topics from this textual data obtained from Twitter along with RSS feed of news headlines. The observed results show that both the algorithms perform well in detecting topics from text streams, the results of LDA being more semantically interpretable while NMF being faster of the two.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信