{"title":"Improving the Classification of Drunk Texting in Tweets Using Semantic Enrichment","authors":"Marcos A. Grzeça, K. Becker, R. Galante","doi":"10.1109/WI.2018.00-90","DOIUrl":null,"url":null,"abstract":"Excessive alcohol consumption is a worldwide problem, and social networks such as Twitter can provide valuable data that help understanding factors related to alcoholism, particularly among youngsters. The identification of drunk tweets (i.e. posted under the influence of alcohol) is complex because tweets are short, sparse and written with diverse and internet specific vocabulary, possibly with errors due to alcohol influence. In this paper, we propose an enriching framework that integrates conceptual and semantic features that expand and generalize the vocabulary, providing context to tweet terms. It also handles misspellings and the selection of discriminative features resulting from contextual enrichment. We outperformed the baseline, achieving improvements of 13.79 percentage points in recall, with no significant harm to precision. We illustrate the value of drunk tweets classification by developing an exploratory analysis that reveals drunk tweeters demographics and tweet properties.","PeriodicalId":405966,"journal":{"name":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2018.00-90","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Excessive alcohol consumption is a worldwide problem, and social networks such as Twitter can provide valuable data that help understanding factors related to alcoholism, particularly among youngsters. The identification of drunk tweets (i.e. posted under the influence of alcohol) is complex because tweets are short, sparse and written with diverse and internet specific vocabulary, possibly with errors due to alcohol influence. In this paper, we propose an enriching framework that integrates conceptual and semantic features that expand and generalize the vocabulary, providing context to tweet terms. It also handles misspellings and the selection of discriminative features resulting from contextual enrichment. We outperformed the baseline, achieving improvements of 13.79 percentage points in recall, with no significant harm to precision. We illustrate the value of drunk tweets classification by developing an exploratory analysis that reveals drunk tweeters demographics and tweet properties.