Víctor D. Cortés, J. D. Velásquez, Carlos F. Ibáñez
{"title":"大麻信息流行病学的推特","authors":"Víctor D. Cortés, J. D. Velásquez, Carlos F. Ibáñez","doi":"10.1145/3106426.3106541","DOIUrl":null,"url":null,"abstract":"Today online social networks seem to be good tools to quickly monitor what is going on with the population, since they provide environments where users can freely share large amounts of information related to their own lives. Due to well known limitations of surveys, this novel kind of data can be used to get additional real time insights from people to understand their actual behavior related to drug use. The aim of this work is to make use of text messages (tweets) and relationships between Chilean Twitter users to predict marijuana use among them. To do this we collected Twitter accounts using a location-based criteria, and built a set of features based on tweets they made and ego centric network metrics. To get tweet-based features, tweets were filtered using marijuana-related keywords and a set of 1000 tweets were manually labeled to train algorithms capable of predicting marijuana use in tweets. In addition, a sentiment classifier of tweets was developed using the TASS corpus. Then, we made a survey to get real marijuana use labels related to accounts and these labels were used to train supervised machine learning algorithms. The marijuana use per user classifier had precision, recall and F-measure results close to 0.7, implying significant predictive power of the selected variables. We obtained a model capable of predicting marijuana use of Twitter users and estimating their opinion about marijuana. This information can be used as an efficient (fast and low cost) tool for marijuana surveillance, and support decision making about drug policies.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Twitter for marijuana infodemiology\",\"authors\":\"Víctor D. Cortés, J. D. Velásquez, Carlos F. Ibáñez\",\"doi\":\"10.1145/3106426.3106541\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today online social networks seem to be good tools to quickly monitor what is going on with the population, since they provide environments where users can freely share large amounts of information related to their own lives. Due to well known limitations of surveys, this novel kind of data can be used to get additional real time insights from people to understand their actual behavior related to drug use. The aim of this work is to make use of text messages (tweets) and relationships between Chilean Twitter users to predict marijuana use among them. To do this we collected Twitter accounts using a location-based criteria, and built a set of features based on tweets they made and ego centric network metrics. To get tweet-based features, tweets were filtered using marijuana-related keywords and a set of 1000 tweets were manually labeled to train algorithms capable of predicting marijuana use in tweets. In addition, a sentiment classifier of tweets was developed using the TASS corpus. Then, we made a survey to get real marijuana use labels related to accounts and these labels were used to train supervised machine learning algorithms. The marijuana use per user classifier had precision, recall and F-measure results close to 0.7, implying significant predictive power of the selected variables. We obtained a model capable of predicting marijuana use of Twitter users and estimating their opinion about marijuana. This information can be used as an efficient (fast and low cost) tool for marijuana surveillance, and support decision making about drug policies.\",\"PeriodicalId\":20685,\"journal\":{\"name\":\"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3106426.3106541\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106426.3106541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Today online social networks seem to be good tools to quickly monitor what is going on with the population, since they provide environments where users can freely share large amounts of information related to their own lives. Due to well known limitations of surveys, this novel kind of data can be used to get additional real time insights from people to understand their actual behavior related to drug use. The aim of this work is to make use of text messages (tweets) and relationships between Chilean Twitter users to predict marijuana use among them. To do this we collected Twitter accounts using a location-based criteria, and built a set of features based on tweets they made and ego centric network metrics. To get tweet-based features, tweets were filtered using marijuana-related keywords and a set of 1000 tweets were manually labeled to train algorithms capable of predicting marijuana use in tweets. In addition, a sentiment classifier of tweets was developed using the TASS corpus. Then, we made a survey to get real marijuana use labels related to accounts and these labels were used to train supervised machine learning algorithms. The marijuana use per user classifier had precision, recall and F-measure results close to 0.7, implying significant predictive power of the selected variables. We obtained a model capable of predicting marijuana use of Twitter users and estimating their opinion about marijuana. This information can be used as an efficient (fast and low cost) tool for marijuana surveillance, and support decision making about drug policies.