{"title":"Does geographical location have an impact on data samples extracted from Twitter?","authors":"R. Ivanova, Stefan Sobernig, Mark Strembeck","doi":"10.1109/SNAMS58071.2022.10062544","DOIUrl":null,"url":null,"abstract":"We report on an experiment that used ten different machines running on a standardized cloud platform in five different geographical locations around the globe (Frankfurt/Germany, Mumbai/India, Sydney/Australia, Seoul/South Korea, Virginia/USA) to collect datasets using Twitter's public free-of-charge API. Each of the ten machines extracted the tweets at the exact same time and using the exact same Twitter API parameters. We found that the characteristics of the datasets collected in different locations vary considerably, potentially affecting any analysis performed on such location-biased data. For example, the number of exactly identical tweets (i.e. all 90 metadata attributes of the tweets are the same for all ten machines) lays only between 0.15% and 20%. Based on these findings, we derive recommendations on how to mitigate the location-bias in practice.","PeriodicalId":371668,"journal":{"name":"2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SNAMS58071.2022.10062544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We report on an experiment that used ten different machines running on a standardized cloud platform in five different geographical locations around the globe (Frankfurt/Germany, Mumbai/India, Sydney/Australia, Seoul/South Korea, Virginia/USA) to collect datasets using Twitter's public free-of-charge API. Each of the ten machines extracted the tweets at the exact same time and using the exact same Twitter API parameters. We found that the characteristics of the datasets collected in different locations vary considerably, potentially affecting any analysis performed on such location-biased data. For example, the number of exactly identical tweets (i.e. all 90 metadata attributes of the tweets are the same for all ten machines) lays only between 0.15% and 20%. Based on these findings, we derive recommendations on how to mitigate the location-bias in practice.