中文微博话题检测与跟踪的一种改进单遍算法

Danfeng Yan, Enzheng Hua, Bo Hu
{"title":"中文微博话题检测与跟踪的一种改进单遍算法","authors":"Danfeng Yan, Enzheng Hua, Bo Hu","doi":"10.1109/BigDataCongress.2016.39","DOIUrl":null,"url":null,"abstract":"Microblog is a very popular social platform, as the source of news and popular information dissemination. Detection and tracking of hot topics through Microblog research has arose the domestic and foreign scholar's attention. So, this paper mainly focuses on financial domain topic detection and tracking of Chinese Microblog. In this paper, we propose incremental TF-IWF-IDF of terms part-of-speech and position weight calculation method. This weight calculation method can solve the problem that IDF of TF-IDF is a constant value and can't change with the dataset dynamically. The traditional feature vector doesn't consider the semantic and context of terms. The paper proposes a new feature vector representation method to solve this problem by incorporating IWF into TF-IDF. This text representation method is called Word vector based on an incremental TF-IWF-IDF of terms part-of-speech and position. This paper proposes Two Steps of Single-Pass based on Multi Topic Centers (MC-TSP) to overcome the shortcomings of the traditional Single-Pass algorithm. By experimental comparison, the improved algorithm has better performance than the traditional Single-Pass algorithm. With improved algorithm, financial hot topic detection and tracking model is designed and implemented. The application of this model in financial domain improved the accuracy of topic detection and tracking.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"An Improved Single-Pass Algorithm for Chinese Microblog Topic Detection and Tracking\",\"authors\":\"Danfeng Yan, Enzheng Hua, Bo Hu\",\"doi\":\"10.1109/BigDataCongress.2016.39\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microblog is a very popular social platform, as the source of news and popular information dissemination. Detection and tracking of hot topics through Microblog research has arose the domestic and foreign scholar's attention. So, this paper mainly focuses on financial domain topic detection and tracking of Chinese Microblog. In this paper, we propose incremental TF-IWF-IDF of terms part-of-speech and position weight calculation method. This weight calculation method can solve the problem that IDF of TF-IDF is a constant value and can't change with the dataset dynamically. The traditional feature vector doesn't consider the semantic and context of terms. The paper proposes a new feature vector representation method to solve this problem by incorporating IWF into TF-IDF. This text representation method is called Word vector based on an incremental TF-IWF-IDF of terms part-of-speech and position. This paper proposes Two Steps of Single-Pass based on Multi Topic Centers (MC-TSP) to overcome the shortcomings of the traditional Single-Pass algorithm. By experimental comparison, the improved algorithm has better performance than the traditional Single-Pass algorithm. With improved algorithm, financial hot topic detection and tracking model is designed and implemented. The application of this model in financial domain improved the accuracy of topic detection and tracking.\",\"PeriodicalId\":407471,\"journal\":{\"name\":\"2016 IEEE International Congress on Big Data (BigData Congress)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Congress on Big Data (BigData Congress)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BigDataCongress.2016.39\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Congress on Big Data (BigData Congress)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigDataCongress.2016.39","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

微博是一个非常受欢迎的社交平台,作为新闻的来源和流行信息的传播。通过微博研究对热点话题进行检测和跟踪,引起了国内外学者的关注。因此,本文主要研究中文微博的金融领域话题检测与跟踪。在本文中,我们提出了增量TF-IWF-IDF的词性和位置权重计算方法。这种权重计算方法可以解决TF-IDF的IDF是一个常量,不能随数据集动态变化的问题。传统的特征向量没有考虑术语的语义和上下文。本文提出了一种新的特征向量表示方法,通过将IWF纳入TF-IDF来解决这一问题。这种文本表示方法称为基于词汇词性和位置的增量TF-IWF-IDF的词向量。本文提出了基于多主题中心(MC-TSP)的两步单遍算法,克服了传统单遍算法的不足。通过实验对比,改进后的算法比传统的单通道算法具有更好的性能。利用改进的算法,设计并实现了金融热点话题检测与跟踪模型。该模型在金融领域的应用提高了话题检测和跟踪的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Improved Single-Pass Algorithm for Chinese Microblog Topic Detection and Tracking
Microblog is a very popular social platform, as the source of news and popular information dissemination. Detection and tracking of hot topics through Microblog research has arose the domestic and foreign scholar's attention. So, this paper mainly focuses on financial domain topic detection and tracking of Chinese Microblog. In this paper, we propose incremental TF-IWF-IDF of terms part-of-speech and position weight calculation method. This weight calculation method can solve the problem that IDF of TF-IDF is a constant value and can't change with the dataset dynamically. The traditional feature vector doesn't consider the semantic and context of terms. The paper proposes a new feature vector representation method to solve this problem by incorporating IWF into TF-IDF. This text representation method is called Word vector based on an incremental TF-IWF-IDF of terms part-of-speech and position. This paper proposes Two Steps of Single-Pass based on Multi Topic Centers (MC-TSP) to overcome the shortcomings of the traditional Single-Pass algorithm. By experimental comparison, the improved algorithm has better performance than the traditional Single-Pass algorithm. With improved algorithm, financial hot topic detection and tracking model is designed and implemented. The application of this model in financial domain improved the accuracy of topic detection and tracking.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信