{"title":"从Twitter数据进行事件检测","authors":"Jagrati Singh, Ishneet Kaur, Anil Kumar Singh","doi":"10.1109/ISCON47742.2019.9036286","DOIUrl":null,"url":null,"abstract":"Event detection from Twitter is important for people to extract valuable information about real world events. Automation of this task is challenging due to short and noisy nature of microblogging data. Topic modeling algorithms such as Latent Dirichlet Allocation (LDA) is the most popular algorithm to extract topics from news articles but not suitable for microblogging content due to the data sparsity problem. In this paper, we proposed a method to handle data sparsity problem that makes LDA topic model suitable for Twitter data by considering super tweet (aggregation of similar tweets) as a document instead of single tweet without modifying internal structure of model. Extensive experiments on real-time twitter data show that our approach outperforms the baseline approaches.","PeriodicalId":124412,"journal":{"name":"2019 4th International Conference on Information Systems and Computer Networks (ISCON)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Event detection from Twitter data\",\"authors\":\"Jagrati Singh, Ishneet Kaur, Anil Kumar Singh\",\"doi\":\"10.1109/ISCON47742.2019.9036286\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Event detection from Twitter is important for people to extract valuable information about real world events. Automation of this task is challenging due to short and noisy nature of microblogging data. Topic modeling algorithms such as Latent Dirichlet Allocation (LDA) is the most popular algorithm to extract topics from news articles but not suitable for microblogging content due to the data sparsity problem. In this paper, we proposed a method to handle data sparsity problem that makes LDA topic model suitable for Twitter data by considering super tweet (aggregation of similar tweets) as a document instead of single tweet without modifying internal structure of model. Extensive experiments on real-time twitter data show that our approach outperforms the baseline approaches.\",\"PeriodicalId\":124412,\"journal\":{\"name\":\"2019 4th International Conference on Information Systems and Computer Networks (ISCON)\",\"volume\":\"75 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 4th International Conference on Information Systems and Computer Networks (ISCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCON47742.2019.9036286\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 4th International Conference on Information Systems and Computer Networks (ISCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCON47742.2019.9036286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Event detection from Twitter is important for people to extract valuable information about real world events. Automation of this task is challenging due to short and noisy nature of microblogging data. Topic modeling algorithms such as Latent Dirichlet Allocation (LDA) is the most popular algorithm to extract topics from news articles but not suitable for microblogging content due to the data sparsity problem. In this paper, we proposed a method to handle data sparsity problem that makes LDA topic model suitable for Twitter data by considering super tweet (aggregation of similar tweets) as a document instead of single tweet without modifying internal structure of model. Extensive experiments on real-time twitter data show that our approach outperforms the baseline approaches.