基于频率、位置、散射和话题权重的时间切片新闻文档热点话题提取

2013 15th International Conference on Advanced Computing Technologies (ICACT) Pub Date : 2013-09-01 DOI:10.1109/ICACT.2013.6710495

Y. Jahnavi, Y. Radhika

{"title":"基于频率、位置、散射和话题权重的时间切片新闻文档热点话题提取","authors":"Y. Jahnavi, Y. Radhika","doi":"10.1109/ICACT.2013.6710495","DOIUrl":null,"url":null,"abstract":"Internet based news documents are the basic information transmission media. In such a case detecting hot topics and tracking the event development is most important. However, it is almost impossible to view all the generated topics, due to its large amount of size. Therefore it is necessary to rank the topics. The topic ranking should be done on the importance basis. But this importance is determined by how frequently a topic appears and this importance varies in different time slots. For extracting hot topics, most of the text mining approaches with vector space model need to determine the weighting of the feature terms. Existing traditional algorithms can't achieve high accuracy for retrieving hot terms, because they have not considered position, scattering and topicality. This paper presents an innovative and effective hot term extraction by considering position, scattering and topicality of terms along with frequency.","PeriodicalId":302640,"journal":{"name":"2013 15th International Conference on Advanced Computing Technologies (ICACT)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Hot topic extraction based on frequency, position, scattering and topical weight for time sliced news documents\",\"authors\":\"Y. Jahnavi, Y. Radhika\",\"doi\":\"10.1109/ICACT.2013.6710495\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Internet based news documents are the basic information transmission media. In such a case detecting hot topics and tracking the event development is most important. However, it is almost impossible to view all the generated topics, due to its large amount of size. Therefore it is necessary to rank the topics. The topic ranking should be done on the importance basis. But this importance is determined by how frequently a topic appears and this importance varies in different time slots. For extracting hot topics, most of the text mining approaches with vector space model need to determine the weighting of the feature terms. Existing traditional algorithms can't achieve high accuracy for retrieving hot terms, because they have not considered position, scattering and topicality. This paper presents an innovative and effective hot term extraction by considering position, scattering and topicality of terms along with frequency.\",\"PeriodicalId\":302640,\"journal\":{\"name\":\"2013 15th International Conference on Advanced Computing Technologies (ICACT)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 15th International Conference on Advanced Computing Technologies (ICACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACT.2013.6710495\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 15th International Conference on Advanced Computing Technologies (ICACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACT.2013.6710495","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

基于互联网的新闻文件是基本的信息传播媒介。在这种情况下，发现热点话题并跟踪事件发展就显得尤为重要。然而，要查看所有生成的主题几乎是不可能的，因为它的大小很大。因此，有必要对主题进行排序。主题排名应根据重要性进行排序。但这种重要性是由一个话题出现的频率决定的，而且这种重要性在不同的时间段有所不同。为了提取热点话题，大多数基于向量空间模型的文本挖掘方法都需要确定特征项的权重。现有的传统算法由于没有考虑位置、散射和话题性等因素，无法达到较高的检索精度。本文提出了一种新颖有效的热词提取方法，该方法考虑了词的位置、散射和话题性以及频率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hot topic extraction based on frequency, position, scattering and topical weight for time sliced news documents

Internet based news documents are the basic information transmission media. In such a case detecting hot topics and tracking the event development is most important. However, it is almost impossible to view all the generated topics, due to its large amount of size. Therefore it is necessary to rank the topics. The topic ranking should be done on the importance basis. But this importance is determined by how frequently a topic appears and this importance varies in different time slots. For extracting hot topics, most of the text mining approaches with vector space model need to determine the weighting of the feature terms. Existing traditional algorithms can't achieve high accuracy for retrieving hot terms, because they have not considered position, scattering and topicality. This paper presents an innovative and effective hot term extraction by considering position, scattering and topicality of terms along with frequency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 15th International Conference on Advanced Computing Technologies (ICACT)

自引率

0.00%

发文量