{"title":"Geospatial Knowledge in Housing Advertisements: Capturing and Extracting Spatial Information from Text","authors":"L. Cadorel, Alicia Blanchi, A. Tettamanzi","doi":"10.1145/3460210.3493547","DOIUrl":null,"url":null,"abstract":"Information of the geographical and spatial type is found in numerous text documents and constitutes a very challenging target for extraction. Geoparsing applications have been developed to extract geographic terms. However, off-the-shelf Named Entity Recognition (NER) models are mainly designed for Toponym recognition and are very sensitive to language specificity. In this paper, we propose a workflow to first extract geographic and spatial entities based on a BiLSTM-CRF architecture with a concatenation of several text representations. We also propose a Relation Extraction module, particularly aimed at spatial relationships extraction, to build a structured Geospatial knowledge base. We demonstrate our pipeline by applying it to the case of French housing advertisements, which generally provide information about a property's location and neighbourhood. Our results show that the workflow tackles French language and the variability and irregularity of housing advertisements, generalizes Geoparsing to all geographic and spatial terms, and successfully retrieves most of the relationships between entities from the text.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th on Knowledge Capture Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460210.3493547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10
Abstract
Information of the geographical and spatial type is found in numerous text documents and constitutes a very challenging target for extraction. Geoparsing applications have been developed to extract geographic terms. However, off-the-shelf Named Entity Recognition (NER) models are mainly designed for Toponym recognition and are very sensitive to language specificity. In this paper, we propose a workflow to first extract geographic and spatial entities based on a BiLSTM-CRF architecture with a concatenation of several text representations. We also propose a Relation Extraction module, particularly aimed at spatial relationships extraction, to build a structured Geospatial knowledge base. We demonstrate our pipeline by applying it to the case of French housing advertisements, which generally provide information about a property's location and neighbourhood. Our results show that the workflow tackles French language and the variability and irregularity of housing advertisements, generalizes Geoparsing to all geographic and spatial terms, and successfully retrieves most of the relationships between entities from the text.