H. N. Serere, Bernd Resch, C. Havas, Andreas Petutschnig
{"title":"Extracting and Geocoding Locations in Social Media Posts: A Comparative Analysis","authors":"H. N. Serere, Bernd Resch, C. Havas, Andreas Petutschnig","doi":"10.1553/giscience2021_02_s167","DOIUrl":null,"url":null,"abstract":"Geo-social media have become an established data source for spatial analysis of geographic and social processes in various fields. However, only a small share of geo-social media data are explicitly georeferenced, which often compromises the reliability of the analysis results by excluding large volumes of data from the analysis. To increase the number of georeferenced tweets, inferred locations can be extracted from the texts of social media posts. We propose a customized workflow for location extraction from tweets and subsequent geocoding. We compare the results of two methods: DBpedia Spotlight (using linked Wikipedia entities), and spaCy combined with the geocoding methods of OpenStreetMap Nominatim. The results suggest that the workflow using spaCy and Nominatim identifies more locations than DBpedia Spotlight. For 50,616 tweets posted within California, USA, the granularity of the extracted locations is reasonable. However, several directions for future research were identified, including improved semantic analysis, the creation of a cascading workflow, and the need to integrate different data sources in order to increase reliability and spatial accuracy.","PeriodicalId":29645,"journal":{"name":"GI_Forum","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GI_Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1553/giscience2021_02_s167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 3
Abstract
Geo-social media have become an established data source for spatial analysis of geographic and social processes in various fields. However, only a small share of geo-social media data are explicitly georeferenced, which often compromises the reliability of the analysis results by excluding large volumes of data from the analysis. To increase the number of georeferenced tweets, inferred locations can be extracted from the texts of social media posts. We propose a customized workflow for location extraction from tweets and subsequent geocoding. We compare the results of two methods: DBpedia Spotlight (using linked Wikipedia entities), and spaCy combined with the geocoding methods of OpenStreetMap Nominatim. The results suggest that the workflow using spaCy and Nominatim identifies more locations than DBpedia Spotlight. For 50,616 tweets posted within California, USA, the granularity of the extracted locations is reasonable. However, several directions for future research were identified, including improved semantic analysis, the creation of a cascading workflow, and the need to integrate different data sources in order to increase reliability and spatial accuracy.