在线热点话题发现与热度评价

Chunhui Deng, Huifang Deng, Yuxin Liu
{"title":"在线热点话题发现与热度评价","authors":"Chunhui Deng, Huifang Deng, Yuxin Liu","doi":"10.1145/3331453.3361319","DOIUrl":null,"url":null,"abstract":"In this paper, by analyzing the inadequacies of traditional TF-IDF(Term Frequency-Inverse Document Frequency) method and taking into account the factors of the location information, named entity and feature term burstiness, we put forward an improved weight calculation formula i.e., a new TF-IDF to update the feature term weight in real time. In this way, the accuracy of news representation model can be improved to some extent. Incremental k-means clustering based on time window and multi-center topic model is proposed to tackle topic center drift problem, reduce the error caused by inadequate topic model, and therefore, improve the clustering accuracy. At last, we defined an improved energy accumulation formula. And based on media attention, topic competition, topic burstiness magnitude and topic cohesiveness, we constructed a topic hotness evaluation model to quantify the topic hotness and therefore to better distinguish the hot topics from the cold topics. The experimental results demonstrated the effectiveness of our approaches and models.","PeriodicalId":162067,"journal":{"name":"Proceedings of the 3rd International Conference on Computer Science and Application Engineering","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Online Hot Topic Discovery and Hotness Evaluation\",\"authors\":\"Chunhui Deng, Huifang Deng, Yuxin Liu\",\"doi\":\"10.1145/3331453.3361319\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, by analyzing the inadequacies of traditional TF-IDF(Term Frequency-Inverse Document Frequency) method and taking into account the factors of the location information, named entity and feature term burstiness, we put forward an improved weight calculation formula i.e., a new TF-IDF to update the feature term weight in real time. In this way, the accuracy of news representation model can be improved to some extent. Incremental k-means clustering based on time window and multi-center topic model is proposed to tackle topic center drift problem, reduce the error caused by inadequate topic model, and therefore, improve the clustering accuracy. At last, we defined an improved energy accumulation formula. And based on media attention, topic competition, topic burstiness magnitude and topic cohesiveness, we constructed a topic hotness evaluation model to quantify the topic hotness and therefore to better distinguish the hot topics from the cold topics. The experimental results demonstrated the effectiveness of our approaches and models.\",\"PeriodicalId\":162067,\"journal\":{\"name\":\"Proceedings of the 3rd International Conference on Computer Science and Application Engineering\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Conference on Computer Science and Application Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3331453.3361319\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Computer Science and Application Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3331453.3361319","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

本文通过分析传统TF-IDF(Term Frequency- inverse Document Frequency)方法的不足之处,考虑到位置信息、命名实体和特征项突发性等因素,提出了一种改进的权重计算公式,即一种新的TF-IDF,实时更新特征项权重。这样可以在一定程度上提高新闻表示模型的准确性。提出了基于时间窗和多中心主题模型的增量k-means聚类方法,解决了主题中心漂移问题,减少了由于主题模型不完善导致的误差,从而提高了聚类精度。最后,我们定义了一个改进的能量积累公式。并基于媒体关注度、话题竞争程度、话题爆发程度和话题凝聚力,构建话题热度评价模型,量化话题热度,从而更好地区分热点话题和冷话题。实验结果证明了我们的方法和模型的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Online Hot Topic Discovery and Hotness Evaluation
In this paper, by analyzing the inadequacies of traditional TF-IDF(Term Frequency-Inverse Document Frequency) method and taking into account the factors of the location information, named entity and feature term burstiness, we put forward an improved weight calculation formula i.e., a new TF-IDF to update the feature term weight in real time. In this way, the accuracy of news representation model can be improved to some extent. Incremental k-means clustering based on time window and multi-center topic model is proposed to tackle topic center drift problem, reduce the error caused by inadequate topic model, and therefore, improve the clustering accuracy. At last, we defined an improved energy accumulation formula. And based on media attention, topic competition, topic burstiness magnitude and topic cohesiveness, we constructed a topic hotness evaluation model to quantify the topic hotness and therefore to better distinguish the hot topics from the cold topics. The experimental results demonstrated the effectiveness of our approaches and models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信