{"title":"Integration of linked data sources for gazetteer expansion","authors":"T. Moura, C. Davis","doi":"10.1145/2675354.2675357","DOIUrl":null,"url":null,"abstract":"The determination of the geographic scope of documents is important for many applications in geographic information retrieval (GIR). Many techniques require the use of gazetteers as a source of reference data. However, creating and maintaining gazetteers is still a complex and demanding task. We propose using linked data sources to put together gazetteer data that can be both broad (e.g. planetary) and deep (e.g., down to urban detail). Linked data sources also allow enriching the resulting gazetteer with a set of geographic and semantic relationships involving place names and other geographic and non-geographic terms, thus expanding the possibilities for solving typical GIR problems such as disambiguation and filtering. This work shows the results of efforts to combine two linked data sources of gazetteer data, namely GeoNames and DBPedia, to populate an integrated and semantically-enriched gazetteer. We used evidence contained in attributes, such as Wikipedia URLs, Linked Data predicates that indicate that places in both sources are the same, and some additional criteria. The resulting gazetteer contains 8,729,833 places, of which 426;317 are found in both data sources. This relatively small overlap is analyzed, indicating that GeoNames and DBPedia are complementary, covering typically different classes of places, thus leading to the idea that further expansion can be achieved by integrating gazetteer data from additional Linked Data sources.","PeriodicalId":286892,"journal":{"name":"Proceedings of the 8th Workshop on Geographic Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th Workshop on Geographic Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2675354.2675357","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
The determination of the geographic scope of documents is important for many applications in geographic information retrieval (GIR). Many techniques require the use of gazetteers as a source of reference data. However, creating and maintaining gazetteers is still a complex and demanding task. We propose using linked data sources to put together gazetteer data that can be both broad (e.g. planetary) and deep (e.g., down to urban detail). Linked data sources also allow enriching the resulting gazetteer with a set of geographic and semantic relationships involving place names and other geographic and non-geographic terms, thus expanding the possibilities for solving typical GIR problems such as disambiguation and filtering. This work shows the results of efforts to combine two linked data sources of gazetteer data, namely GeoNames and DBPedia, to populate an integrated and semantically-enriched gazetteer. We used evidence contained in attributes, such as Wikipedia URLs, Linked Data predicates that indicate that places in both sources are the same, and some additional criteria. The resulting gazetteer contains 8,729,833 places, of which 426;317 are found in both data sources. This relatively small overlap is analyzed, indicating that GeoNames and DBPedia are complementary, covering typically different classes of places, thus leading to the idea that further expansion can be achieved by integrating gazetteer data from additional Linked Data sources.