Junliang Bai, Jun Guo, Guang Chen, Weiran Xu, Gang Du
{"title":"An Efficient Algorithm of Hot Events Detection in Text Streams","authors":"Junliang Bai, Jun Guo, Guang Chen, Weiran Xu, Gang Du","doi":"10.1109/CYBERC.2010.65","DOIUrl":null,"url":null,"abstract":"Hot events detection in text streams has drawn increasing attention in recent sequential data mining works. Different from traditional TDT task which find all the real events’ cluster, hot events detection only identify hot events concerned by public. This paper proposes a novel approach to identify those events based on burst terms, terms co-occurrence and generative probabilistic model. Experiments with huge text stream sets crawled from WWW suggest that our algorithm can work on-line and identify hot events effectively and efficiently.","PeriodicalId":315132,"journal":{"name":"2010 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CYBERC.2010.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Hot events detection in text streams has drawn increasing attention in recent sequential data mining works. Different from traditional TDT task which find all the real events’ cluster, hot events detection only identify hot events concerned by public. This paper proposes a novel approach to identify those events based on burst terms, terms co-occurrence and generative probabilistic model. Experiments with huge text stream sets crawled from WWW suggest that our algorithm can work on-line and identify hot events effectively and efficiently.