A machine learning based approach to identify geo-location of Twitter users

Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing Pub Date : 2017-03-22 DOI:10.1145/3018896.3018969

Aytuğ Onan

{"title":"A machine learning based approach to identify geo-location of Twitter users","authors":"Aytuğ Onan","doi":"10.1145/3018896.3018969","DOIUrl":null,"url":null,"abstract":"Twitter, a popular microblogging platform, has attracted great attention. Twitter enables people from all over the world to interact in an extremely personal way. The immense quantity of user-generated text messages become available on Twitter that could potentially serve as an important source of information for researchers and practitioners. The information available on Twitter may be utilized for many purposes, such as event detection, public health and crisis management. In order to effectively coordinate such activities, the identification of Twitter users' geo-locations is extremely important. Though online social networks can provide some sort of geo-location information based on GPS coordinates, Twitter suffers from geo-location sparseness problem. The identification of Twitter users' geo-location based on the content of send out messages, becomes extremely important. In this regard, this paper presents a machine learning based approach to the problem. In this study, our corpora is represented as a word vector. To obtain a classification scheme with high predictive performance, the performance of five classification algorithms, three ensemble methods and two feature selection methods are evaluated. Among the compared algorithms, the highest results (84.85%) is achieved by AdaBoost ensemble of Random Forest, when the feature set is selected with the use of consistency-based feature selection method in conjunction with best first search.","PeriodicalId":131464,"journal":{"name":"Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing","volume":"537 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3018896.3018969","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Twitter, a popular microblogging platform, has attracted great attention. Twitter enables people from all over the world to interact in an extremely personal way. The immense quantity of user-generated text messages become available on Twitter that could potentially serve as an important source of information for researchers and practitioners. The information available on Twitter may be utilized for many purposes, such as event detection, public health and crisis management. In order to effectively coordinate such activities, the identification of Twitter users' geo-locations is extremely important. Though online social networks can provide some sort of geo-location information based on GPS coordinates, Twitter suffers from geo-location sparseness problem. The identification of Twitter users' geo-location based on the content of send out messages, becomes extremely important. In this regard, this paper presents a machine learning based approach to the problem. In this study, our corpora is represented as a word vector. To obtain a classification scheme with high predictive performance, the performance of five classification algorithms, three ensemble methods and two feature selection methods are evaluated. Among the compared algorithms, the highest results (84.85%) is achieved by AdaBoost ensemble of Random Forest, when the feature set is selected with the use of consistency-based feature selection method in conjunction with best first search.

查看原文本刊更多论文

一种基于机器学习的方法来识别Twitter用户的地理位置

流行的微博平台推特(Twitter)引起了极大的关注。Twitter使来自世界各地的人们能够以一种极其个性化的方式进行互动。大量用户生成的文本信息在Twitter上可用，这可能成为研究人员和从业者的重要信息来源。Twitter上提供的信息可用于多种目的，例如事件检测、公共卫生和危机管理。为了有效地协调这些活动，识别Twitter用户的地理位置是极其重要的。虽然在线社交网络可以提供基于GPS坐标的某种地理位置信息，但Twitter存在地理位置稀疏的问题。基于发送信息的内容来识别Twitter用户的地理位置，变得极其重要。在这方面，本文提出了一种基于机器学习的方法来解决问题。在本研究中，我们的语料库被表示为一个词向量。为了获得具有较高预测性能的分类方案，对5种分类算法、3种集成方法和2种特征选择方法的性能进行了评价。在比较的算法中，采用基于一致性的特征选择方法结合最佳优先搜索选择特征集时，AdaBoost随机森林集成的结果最高，达到84.85%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing

自引率

0.00%

发文量