{"title":"在线热点话题发现与热度评价","authors":"Chunhui Deng, Huifang Deng, Yuxin Liu","doi":"10.1145/3331453.3361319","DOIUrl":null,"url":null,"abstract":"In this paper, by analyzing the inadequacies of traditional TF-IDF(Term Frequency-Inverse Document Frequency) method and taking into account the factors of the location information, named entity and feature term burstiness, we put forward an improved weight calculation formula i.e., a new TF-IDF to update the feature term weight in real time. In this way, the accuracy of news representation model can be improved to some extent. Incremental k-means clustering based on time window and multi-center topic model is proposed to tackle topic center drift problem, reduce the error caused by inadequate topic model, and therefore, improve the clustering accuracy. At last, we defined an improved energy accumulation formula. And based on media attention, topic competition, topic burstiness magnitude and topic cohesiveness, we constructed a topic hotness evaluation model to quantify the topic hotness and therefore to better distinguish the hot topics from the cold topics. The experimental results demonstrated the effectiveness of our approaches and models.","PeriodicalId":162067,"journal":{"name":"Proceedings of the 3rd International Conference on Computer Science and Application Engineering","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Online Hot Topic Discovery and Hotness Evaluation\",\"authors\":\"Chunhui Deng, Huifang Deng, Yuxin Liu\",\"doi\":\"10.1145/3331453.3361319\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, by analyzing the inadequacies of traditional TF-IDF(Term Frequency-Inverse Document Frequency) method and taking into account the factors of the location information, named entity and feature term burstiness, we put forward an improved weight calculation formula i.e., a new TF-IDF to update the feature term weight in real time. In this way, the accuracy of news representation model can be improved to some extent. Incremental k-means clustering based on time window and multi-center topic model is proposed to tackle topic center drift problem, reduce the error caused by inadequate topic model, and therefore, improve the clustering accuracy. At last, we defined an improved energy accumulation formula. And based on media attention, topic competition, topic burstiness magnitude and topic cohesiveness, we constructed a topic hotness evaluation model to quantify the topic hotness and therefore to better distinguish the hot topics from the cold topics. The experimental results demonstrated the effectiveness of our approaches and models.\",\"PeriodicalId\":162067,\"journal\":{\"name\":\"Proceedings of the 3rd International Conference on Computer Science and Application Engineering\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Conference on Computer Science and Application Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3331453.3361319\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Computer Science and Application Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3331453.3361319","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper, by analyzing the inadequacies of traditional TF-IDF(Term Frequency-Inverse Document Frequency) method and taking into account the factors of the location information, named entity and feature term burstiness, we put forward an improved weight calculation formula i.e., a new TF-IDF to update the feature term weight in real time. In this way, the accuracy of news representation model can be improved to some extent. Incremental k-means clustering based on time window and multi-center topic model is proposed to tackle topic center drift problem, reduce the error caused by inadequate topic model, and therefore, improve the clustering accuracy. At last, we defined an improved energy accumulation formula. And based on media attention, topic competition, topic burstiness magnitude and topic cohesiveness, we constructed a topic hotness evaluation model to quantify the topic hotness and therefore to better distinguish the hot topics from the cold topics. The experimental results demonstrated the effectiveness of our approaches and models.