{"title":"An Improved Single-Pass Algorithm for Chinese Microblog Topic Detection and Tracking","authors":"Danfeng Yan, Enzheng Hua, Bo Hu","doi":"10.1109/BigDataCongress.2016.39","DOIUrl":null,"url":null,"abstract":"Microblog is a very popular social platform, as the source of news and popular information dissemination. Detection and tracking of hot topics through Microblog research has arose the domestic and foreign scholar's attention. So, this paper mainly focuses on financial domain topic detection and tracking of Chinese Microblog. In this paper, we propose incremental TF-IWF-IDF of terms part-of-speech and position weight calculation method. This weight calculation method can solve the problem that IDF of TF-IDF is a constant value and can't change with the dataset dynamically. The traditional feature vector doesn't consider the semantic and context of terms. The paper proposes a new feature vector representation method to solve this problem by incorporating IWF into TF-IDF. This text representation method is called Word vector based on an incremental TF-IWF-IDF of terms part-of-speech and position. This paper proposes Two Steps of Single-Pass based on Multi Topic Centers (MC-TSP) to overcome the shortcomings of the traditional Single-Pass algorithm. By experimental comparison, the improved algorithm has better performance than the traditional Single-Pass algorithm. With improved algorithm, financial hot topic detection and tracking model is designed and implemented. The application of this model in financial domain improved the accuracy of topic detection and tracking.","PeriodicalId":407471,"journal":{"name":"2016 IEEE International Congress on Big Data (BigData Congress)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Congress on Big Data (BigData Congress)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BigDataCongress.2016.39","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17
Abstract
Microblog is a very popular social platform, as the source of news and popular information dissemination. Detection and tracking of hot topics through Microblog research has arose the domestic and foreign scholar's attention. So, this paper mainly focuses on financial domain topic detection and tracking of Chinese Microblog. In this paper, we propose incremental TF-IWF-IDF of terms part-of-speech and position weight calculation method. This weight calculation method can solve the problem that IDF of TF-IDF is a constant value and can't change with the dataset dynamically. The traditional feature vector doesn't consider the semantic and context of terms. The paper proposes a new feature vector representation method to solve this problem by incorporating IWF into TF-IDF. This text representation method is called Word vector based on an incremental TF-IWF-IDF of terms part-of-speech and position. This paper proposes Two Steps of Single-Pass based on Multi Topic Centers (MC-TSP) to overcome the shortcomings of the traditional Single-Pass algorithm. By experimental comparison, the improved algorithm has better performance than the traditional Single-Pass algorithm. With improved algorithm, financial hot topic detection and tracking model is designed and implemented. The application of this model in financial domain improved the accuracy of topic detection and tracking.