{"title":"Event attendance prediction using social media","authors":"Ubaid Mehmood, I. Moser, Nicole Ronald","doi":"10.1145/3373017.3373033","DOIUrl":null,"url":null,"abstract":"Predicting attendance at events a few hours in advance can be useful for organisers and road users alike. Several studies attempt to detect attendance at the time of the event from social media using geo-tagging or event-based social networks. In this study, we present a novel attendance classifier based on an LSTM and show that it outperforms other machine learning algorithms on two recent data sets with a few thousand attendees. The attendance prediction is based on the content of tweets alone, without the need for network or geospatial information. The pertinent analysis of the tweets requires text pre-processing, a sequence of steps that are implicit in the classification process and generally not discussed in other studies. We conducted a sensitivity analysis of text pre-processing steps and found that some steps like stemming and the removal of a custom list of stop words did nothing to improve the result, but the removal of mentions, punctuation and numbers proved very useful in terms of the results. The best-performing combination was identical for both data sets and led to a 6% improvement of the classification performance compared to the worst-performing combination.","PeriodicalId":297760,"journal":{"name":"Proceedings of the Australasian Computer Science Week Multiconference","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Australasian Computer Science Week Multiconference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3373017.3373033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Predicting attendance at events a few hours in advance can be useful for organisers and road users alike. Several studies attempt to detect attendance at the time of the event from social media using geo-tagging or event-based social networks. In this study, we present a novel attendance classifier based on an LSTM and show that it outperforms other machine learning algorithms on two recent data sets with a few thousand attendees. The attendance prediction is based on the content of tweets alone, without the need for network or geospatial information. The pertinent analysis of the tweets requires text pre-processing, a sequence of steps that are implicit in the classification process and generally not discussed in other studies. We conducted a sensitivity analysis of text pre-processing steps and found that some steps like stemming and the removal of a custom list of stop words did nothing to improve the result, but the removal of mentions, punctuation and numbers proved very useful in terms of the results. The best-performing combination was identical for both data sets and led to a 6% improvement of the classification performance compared to the worst-performing combination.