{"title":"Word Embedding based Clustering to Detect Topics in Social Media","authors":"C. Comito, Agostino Forestiero, C. Pizzuti","doi":"10.1145/3350546.3352518","DOIUrl":null,"url":null,"abstract":"Social media are playing an increasingly important role in reporting major events happening in the world. However, detecting events and topics of interest from social media is a challenging task due to the huge magnitude of the data and the complex semantics of the language being processed. The paper proposes an online algorithm to discover topics that incrementally groups short text by incorporating the textual content with latent feature vector representations of words appearing in the text, trained on very large corpora to improve the check-in topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, the approach obtains significant improvements with respect to classical topic detection methods. CCS CONCEPTS• Information systems $\\rightarrow$ Clustering; Data stream mining; Data extraction and integration; • Computing methodologies $\\rightarrow$ Neural networks.","PeriodicalId":171168,"journal":{"name":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3350546.3352518","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29
Abstract
Social media are playing an increasingly important role in reporting major events happening in the world. However, detecting events and topics of interest from social media is a challenging task due to the huge magnitude of the data and the complex semantics of the language being processed. The paper proposes an online algorithm to discover topics that incrementally groups short text by incorporating the textual content with latent feature vector representations of words appearing in the text, trained on very large corpora to improve the check-in topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, the approach obtains significant improvements with respect to classical topic detection methods. CCS CONCEPTS• Information systems $\rightarrow$ Clustering; Data stream mining; Data extraction and integration; • Computing methodologies $\rightarrow$ Neural networks.