{"title":"CHTopoNER model-based method for recognizing Chinese place names from social media information","authors":"Mengwei Zhang, Xingui Liu, Zheng Zhang, Yue Qiu, Zhipeng Jiang, Pengyu Zhang","doi":"10.1007/s10109-023-00433-w","DOIUrl":null,"url":null,"abstract":"<p>Chinese toponym recognition is crucial in named entity recognition and has significant implications for improving geographic information systems. Based on the real-time nature of social media and rich geographical data contained in social media, it is important to identify Chinese toponyms, including compound toponyms, informal toponyms, and other forms of social media content, for automatic geospatial information extraction. However, the strong word-building ability, diverse features, and ambiguity of Chinese toponyms combined with the linguistic irregularities of social media pose significant challenges for accurately locating toponym boundaries and resolving ambiguities. Furthermore, existing Chinese toponym recognition methods often ignore the fusion of local and global features during feature extraction, resulting in semantic information loss. Therefore, we used the Chinese-roberta-wwm-ext pre-trained language model to encode input text and obtain character-level information. An improved SoftLexicon-based statistical method was employed to acquire word-level semantic information, which was then integrated with character-level semantic information. A two-channel neural network layer comprising a bi-directional long short-term memory and an inception-dilated convolutional neural network was utilized to extract global and local features from text. Additionally, a conditional random field was applied to establish label constraints. The proposed deep neural network model, called CHTopoNER, is designed to identify various forms of Chinese toponyms in irregular Chinese social media content. Its effectiveness was validated on four publicly available annotated toponym datasets and a custom social media dataset. CHTopoNER surpasses state-of-the-art Chinese toponym recognition models and achieves promising results for extracting various types of toponyms and spatial location terms.</p>","PeriodicalId":47245,"journal":{"name":"Journal of Geographical Systems","volume":"82 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Geographical Systems","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s10109-023-00433-w","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY","Score":null,"Total":0}
引用次数: 0
Abstract
Chinese toponym recognition is crucial in named entity recognition and has significant implications for improving geographic information systems. Based on the real-time nature of social media and rich geographical data contained in social media, it is important to identify Chinese toponyms, including compound toponyms, informal toponyms, and other forms of social media content, for automatic geospatial information extraction. However, the strong word-building ability, diverse features, and ambiguity of Chinese toponyms combined with the linguistic irregularities of social media pose significant challenges for accurately locating toponym boundaries and resolving ambiguities. Furthermore, existing Chinese toponym recognition methods often ignore the fusion of local and global features during feature extraction, resulting in semantic information loss. Therefore, we used the Chinese-roberta-wwm-ext pre-trained language model to encode input text and obtain character-level information. An improved SoftLexicon-based statistical method was employed to acquire word-level semantic information, which was then integrated with character-level semantic information. A two-channel neural network layer comprising a bi-directional long short-term memory and an inception-dilated convolutional neural network was utilized to extract global and local features from text. Additionally, a conditional random field was applied to establish label constraints. The proposed deep neural network model, called CHTopoNER, is designed to identify various forms of Chinese toponyms in irregular Chinese social media content. Its effectiveness was validated on four publicly available annotated toponym datasets and a custom social media dataset. CHTopoNER surpasses state-of-the-art Chinese toponym recognition models and achieves promising results for extracting various types of toponyms and spatial location terms.
期刊介绍:
The Journal of Geographical Systems (JGS) is an interdisciplinary peer-reviewed academic journal that aims to encourage and promote high-quality scholarship on new theoretical or empirical results, models and methods in the social sciences. It solicits original papers with a spatial dimension that can be of interest to social scientists. Coverage includes regional science, economic geography, spatial economics, regional and urban economics, GIScience and GeoComputation, big data and machine learning. Spatial analysis, spatial econometrics and statistics are strongly represented.
One of the distinctive features of the journal is its concern for the interface between modeling, statistical techniques and spatial issues in a wide spectrum of related fields. An important goal of the journal is to encourage a spatial perspective in the social sciences that emphasizes geographical space as a relevant dimension to our understanding of socio-economic phenomena.
Contributions should be of high-quality, be technically well-crafted, make a substantial contribution to the subject and contain a spatial dimension. The journal also aims to publish, review and survey articles that make recent theoretical and methodological developments more readily accessible to the audience of the journal.
All papers of this journal have undergone rigorous double-blind peer-review, based on initial editor screening and with at least two peer reviewers.
Officially cited as J Geogr Syst