基于 CHTopoNER 模型的从社交媒体信息中识别中文地名的方法

IF 2.8 3区 地球科学 Q1 GEOGRAPHY
Mengwei Zhang, Xingui Liu, Zheng Zhang, Yue Qiu, Zhipeng Jiang, Pengyu Zhang
{"title":"基于 CHTopoNER 模型的从社交媒体信息中识别中文地名的方法","authors":"Mengwei Zhang, Xingui Liu, Zheng Zhang, Yue Qiu, Zhipeng Jiang, Pengyu Zhang","doi":"10.1007/s10109-023-00433-w","DOIUrl":null,"url":null,"abstract":"<p>Chinese toponym recognition is crucial in named entity recognition and has significant implications for improving geographic information systems. Based on the real-time nature of social media and rich geographical data contained in social media, it is important to identify Chinese toponyms, including compound toponyms, informal toponyms, and other forms of social media content, for automatic geospatial information extraction. However, the strong word-building ability, diverse features, and ambiguity of Chinese toponyms combined with the linguistic irregularities of social media pose significant challenges for accurately locating toponym boundaries and resolving ambiguities. Furthermore, existing Chinese toponym recognition methods often ignore the fusion of local and global features during feature extraction, resulting in semantic information loss. Therefore, we used the Chinese-roberta-wwm-ext pre-trained language model to encode input text and obtain character-level information. An improved SoftLexicon-based statistical method was employed to acquire word-level semantic information, which was then integrated with character-level semantic information. A two-channel neural network layer comprising a bi-directional long short-term memory and an inception-dilated convolutional neural network was utilized to extract global and local features from text. Additionally, a conditional random field was applied to establish label constraints. The proposed deep neural network model, called CHTopoNER, is designed to identify various forms of Chinese toponyms in irregular Chinese social media content. Its effectiveness was validated on four publicly available annotated toponym datasets and a custom social media dataset. CHTopoNER surpasses state-of-the-art Chinese toponym recognition models and achieves promising results for extracting various types of toponyms and spatial location terms.</p>","PeriodicalId":47245,"journal":{"name":"Journal of Geographical Systems","volume":"82 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CHTopoNER model-based method for recognizing Chinese place names from social media information\",\"authors\":\"Mengwei Zhang, Xingui Liu, Zheng Zhang, Yue Qiu, Zhipeng Jiang, Pengyu Zhang\",\"doi\":\"10.1007/s10109-023-00433-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Chinese toponym recognition is crucial in named entity recognition and has significant implications for improving geographic information systems. Based on the real-time nature of social media and rich geographical data contained in social media, it is important to identify Chinese toponyms, including compound toponyms, informal toponyms, and other forms of social media content, for automatic geospatial information extraction. However, the strong word-building ability, diverse features, and ambiguity of Chinese toponyms combined with the linguistic irregularities of social media pose significant challenges for accurately locating toponym boundaries and resolving ambiguities. Furthermore, existing Chinese toponym recognition methods often ignore the fusion of local and global features during feature extraction, resulting in semantic information loss. Therefore, we used the Chinese-roberta-wwm-ext pre-trained language model to encode input text and obtain character-level information. An improved SoftLexicon-based statistical method was employed to acquire word-level semantic information, which was then integrated with character-level semantic information. A two-channel neural network layer comprising a bi-directional long short-term memory and an inception-dilated convolutional neural network was utilized to extract global and local features from text. Additionally, a conditional random field was applied to establish label constraints. The proposed deep neural network model, called CHTopoNER, is designed to identify various forms of Chinese toponyms in irregular Chinese social media content. Its effectiveness was validated on four publicly available annotated toponym datasets and a custom social media dataset. CHTopoNER surpasses state-of-the-art Chinese toponym recognition models and achieves promising results for extracting various types of toponyms and spatial location terms.</p>\",\"PeriodicalId\":47245,\"journal\":{\"name\":\"Journal of Geographical Systems\",\"volume\":\"82 1\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-01-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Geographical Systems\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1007/s10109-023-00433-w\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Geographical Systems","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s10109-023-00433-w","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY","Score":null,"Total":0}
引用次数: 0

摘要

中文地名识别是命名实体识别的关键,对改进地理信息系统具有重要意义。基于社交媒体的实时性和社交媒体中包含的丰富地理数据,识别中文地名(包括复合地名、非正式地名和其他形式的社交媒体内容)对于地理空间信息的自动提取非常重要。然而,中文地名造词能力强、特征多样、模糊性强,再加上社交媒体的语言不规则性,给准确定位地名边界和解决模糊问题带来了巨大挑战。此外,现有的中文地名识别方法在提取特征时往往忽略了局部特征和全局特征的融合,导致语义信息丢失。因此,我们使用预先训练好的中文-roberta-wwm-ext 语言模型对输入文本进行编码,获取字符级信息。我们采用了一种基于 SoftLexicon 的改进统计方法来获取词级语义信息,然后将其与字符级语义信息进行整合。双通道神经网络层由双向长短期记忆和初始稀释卷积神经网络组成,用于从文本中提取全局和局部特征。此外,还应用了条件随机场来建立标签约束。所提出的深度神经网络模型名为 CHTopoNER,旨在识别不规范中文社交媒体内容中各种形式的中文地名。该模型的有效性在四个公开的地名注释数据集和一个定制的社交媒体数据集上得到了验证。CHTopoNER 超越了最先进的中文地名识别模型,在提取各种类型的地名和空间位置术语方面取得了可喜的成果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

CHTopoNER model-based method for recognizing Chinese place names from social media information

CHTopoNER model-based method for recognizing Chinese place names from social media information

Chinese toponym recognition is crucial in named entity recognition and has significant implications for improving geographic information systems. Based on the real-time nature of social media and rich geographical data contained in social media, it is important to identify Chinese toponyms, including compound toponyms, informal toponyms, and other forms of social media content, for automatic geospatial information extraction. However, the strong word-building ability, diverse features, and ambiguity of Chinese toponyms combined with the linguistic irregularities of social media pose significant challenges for accurately locating toponym boundaries and resolving ambiguities. Furthermore, existing Chinese toponym recognition methods often ignore the fusion of local and global features during feature extraction, resulting in semantic information loss. Therefore, we used the Chinese-roberta-wwm-ext pre-trained language model to encode input text and obtain character-level information. An improved SoftLexicon-based statistical method was employed to acquire word-level semantic information, which was then integrated with character-level semantic information. A two-channel neural network layer comprising a bi-directional long short-term memory and an inception-dilated convolutional neural network was utilized to extract global and local features from text. Additionally, a conditional random field was applied to establish label constraints. The proposed deep neural network model, called CHTopoNER, is designed to identify various forms of Chinese toponyms in irregular Chinese social media content. Its effectiveness was validated on four publicly available annotated toponym datasets and a custom social media dataset. CHTopoNER surpasses state-of-the-art Chinese toponym recognition models and achieves promising results for extracting various types of toponyms and spatial location terms.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.40
自引率
6.90%
发文量
33
期刊介绍: The Journal of Geographical Systems (JGS) is an interdisciplinary peer-reviewed academic journal that aims to encourage and promote high-quality scholarship on new theoretical or empirical results, models and methods in the social sciences. It solicits original papers with a spatial dimension that can be of interest to social scientists. Coverage includes regional science, economic geography, spatial economics, regional and urban economics, GIScience and GeoComputation, big data and machine learning. Spatial analysis, spatial econometrics and statistics are strongly represented. One of the distinctive features of the journal is its concern for the interface between modeling, statistical techniques and spatial issues in a wide spectrum of related fields. An important goal of the journal is to encourage a spatial perspective in the social sciences that emphasizes geographical space as a relevant dimension to our understanding of socio-economic phenomena. Contributions should be of high-quality, be technically well-crafted, make a substantial contribution to the subject and contain a spatial dimension. The journal also aims to publish, review and survey articles that make recent theoretical and methodological developments more readily accessible to the audience of the journal. All papers of this journal have undergone rigorous double-blind peer-review, based on initial editor screening and with at least two peer reviewers. Officially cited as J Geogr Syst
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信