Automation of data preparation for mapping using natural language processing systems

A. Kolesnikov, Egor Plitchenko, Maria Kropacheva
{"title":"Automation of data preparation for mapping using natural language processing systems","authors":"A. Kolesnikov, Egor Plitchenko, Maria Kropacheva","doi":"10.35595/2414-9179-2022-1-28-659-669","DOIUrl":null,"url":null,"abstract":"The current level of development of information technology makes it possible to automate the processing of those types of data that only a specialist could previously work with. One such example is natural language processing technologies that implement the functions of sentiment analysis, machine translation, and question-answer systems. For the processes of creating cartographic and geoinformation works, the methods of extracting named entities are of the greatest interest, which allows extracting geographical names from unstructured text and linking named entities, which make it possible to create logical links between the extracted names of spatial objects. Their processing, through a local or network database of the service for geocoding, will automate the creation of map layers in a geographic information system based on text messages. The article describes the most popular approaches and their software implementations for solving the problem of extracting named entities in the example of texts of biographies and works of Siberian writers. Rule-based methodologies, maximum entropy models, and convolutional neural networks are analyzed. To assess the quality of the results of extracting geographical names and objects from the text, in addition to the standard F1-score, the authors propose an additional variant of the evaluation method that takes into account a larger number of criteria and is also based on an error matrix. The description of text block markup formats is given to improve the quality of recognition and expand the possible options for geographical names of named entities based on additional training of the neural network model.","PeriodicalId":31498,"journal":{"name":"InterCarto InterGIS","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"InterCarto InterGIS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35595/2414-9179-2022-1-28-659-669","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The current level of development of information technology makes it possible to automate the processing of those types of data that only a specialist could previously work with. One such example is natural language processing technologies that implement the functions of sentiment analysis, machine translation, and question-answer systems. For the processes of creating cartographic and geoinformation works, the methods of extracting named entities are of the greatest interest, which allows extracting geographical names from unstructured text and linking named entities, which make it possible to create logical links between the extracted names of spatial objects. Their processing, through a local or network database of the service for geocoding, will automate the creation of map layers in a geographic information system based on text messages. The article describes the most popular approaches and their software implementations for solving the problem of extracting named entities in the example of texts of biographies and works of Siberian writers. Rule-based methodologies, maximum entropy models, and convolutional neural networks are analyzed. To assess the quality of the results of extracting geographical names and objects from the text, in addition to the standard F1-score, the authors propose an additional variant of the evaluation method that takes into account a larger number of criteria and is also based on an error matrix. The description of text block markup formats is given to improve the quality of recognition and expand the possible options for geographical names of named entities based on additional training of the neural network model.
使用自然语言处理系统进行制图数据准备的自动化
目前信息技术的发展水平使以前只有专家才能处理的那些类型的数据的处理自动化成为可能。其中一个例子是实现情感分析、机器翻译和问答系统功能的自然语言处理技术。对于创建地图和地理信息作品的过程,提取命名实体的方法是最感兴趣的,它允许从非结构化文本中提取地理名称和链接命名实体,这使得在提取的空间对象名称之间创建逻辑链接成为可能。通过地理编码服务的本地或网络数据库对它们进行处理,将在基于文本信息的地理信息系统中自动创建地图层。本文以西伯利亚作家的传记文本和作品为例,介绍了解决命名实体提取问题的最流行的方法及其软件实现。基于规则的方法,最大熵模型和卷积神经网络进行了分析。为了评估从文本中提取地名和物体的结果的质量,除了标准的f1分数外,作者还提出了评估方法的另一种变体,该方法考虑了更多的标准,并且也是基于误差矩阵。本文给出了文本块标记格式的描述,以提高识别质量,并在神经网络模型的额外训练基础上扩展命名实体地理名称的可能选项。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
0.90
自引率
0.00%
发文量
2
审稿时长
8 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信