Batuhan Kilic, Onur Can Bayrak, Fatih Gülgen, Mert Gurturk, Perihan Abay
{"title":"Unveiling the impact of machine learning algorithms on the quality of online geocoding services: a case study using COVID-19 data","authors":"Batuhan Kilic, Onur Can Bayrak, Fatih Gülgen, Mert Gurturk, Perihan Abay","doi":"10.1007/s10109-023-00435-8","DOIUrl":null,"url":null,"abstract":"<p>In today's era, the address plays a crucial role as one of the key components that enable mobility in daily life. Address data are used by global map platforms and location-based services to pinpoint a geographically referenced location. Geocoding provided by online platforms is useful in the spatial tracking of reported cases and controls in the spatial analysis of infectious illnesses such as COVID-19. The first and most critical phase in the geocoding process is address matching. However, due to typographical errors, variations in abbreviations used, and incomplete or malformed addresses, the matching can seldom be performed with 100% accuracy. The purpose of this research is to examine the capabilities of machine learning classifiers that can be used to measure the consistency of address matching results produced by online geocoding services and to identify the best performing classifier. The performance of the seven machine learning classifiers was compared using several text similarity measures, which assess the match scores between the input address data and the services' output. The data utilized in the testing came from four distinct online geocoding services applied to 925 addresses in Türkiye. The findings from this study revealed that the Random Forest machine learning classifier was the most accurate in the address matching procedure. While the results of this study hold true for similar datasets in Türkiye, additional research is required to determine whether they apply to data in other countries.</p>","PeriodicalId":47245,"journal":{"name":"Journal of Geographical Systems","volume":"255 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Geographical Systems","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s10109-023-00435-8","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY","Score":null,"Total":0}
引用次数: 0
Abstract
In today's era, the address plays a crucial role as one of the key components that enable mobility in daily life. Address data are used by global map platforms and location-based services to pinpoint a geographically referenced location. Geocoding provided by online platforms is useful in the spatial tracking of reported cases and controls in the spatial analysis of infectious illnesses such as COVID-19. The first and most critical phase in the geocoding process is address matching. However, due to typographical errors, variations in abbreviations used, and incomplete or malformed addresses, the matching can seldom be performed with 100% accuracy. The purpose of this research is to examine the capabilities of machine learning classifiers that can be used to measure the consistency of address matching results produced by online geocoding services and to identify the best performing classifier. The performance of the seven machine learning classifiers was compared using several text similarity measures, which assess the match scores between the input address data and the services' output. The data utilized in the testing came from four distinct online geocoding services applied to 925 addresses in Türkiye. The findings from this study revealed that the Random Forest machine learning classifier was the most accurate in the address matching procedure. While the results of this study hold true for similar datasets in Türkiye, additional research is required to determine whether they apply to data in other countries.
期刊介绍:
The Journal of Geographical Systems (JGS) is an interdisciplinary peer-reviewed academic journal that aims to encourage and promote high-quality scholarship on new theoretical or empirical results, models and methods in the social sciences. It solicits original papers with a spatial dimension that can be of interest to social scientists. Coverage includes regional science, economic geography, spatial economics, regional and urban economics, GIScience and GeoComputation, big data and machine learning. Spatial analysis, spatial econometrics and statistics are strongly represented.
One of the distinctive features of the journal is its concern for the interface between modeling, statistical techniques and spatial issues in a wide spectrum of related fields. An important goal of the journal is to encourage a spatial perspective in the social sciences that emphasizes geographical space as a relevant dimension to our understanding of socio-economic phenomena.
Contributions should be of high-quality, be technically well-crafted, make a substantial contribution to the subject and contain a spatial dimension. The journal also aims to publish, review and survey articles that make recent theoretical and methodological developments more readily accessible to the audience of the journal.
All papers of this journal have undergone rigorous double-blind peer-review, based on initial editor screening and with at least two peer reviewers.
Officially cited as J Geogr Syst