Automation of data preparation for mapping using natural language processing systems

InterCarto InterGIS Pub Date : 2022-01-01 DOI:10.35595/2414-9179-2022-1-28-659-669

A. Kolesnikov, Egor Plitchenko, Maria Kropacheva

{"title":"Automation of data preparation for mapping using natural language processing systems","authors":"A. Kolesnikov, Egor Plitchenko, Maria Kropacheva","doi":"10.35595/2414-9179-2022-1-28-659-669","DOIUrl":null,"url":null,"abstract":"The current level of development of information technology makes it possible to automate the processing of those types of data that only a specialist could previously work with. One such example is natural language processing technologies that implement the functions of sentiment analysis, machine translation, and question-answer systems. For the processes of creating cartographic and geoinformation works, the methods of extracting named entities are of the greatest interest, which allows extracting geographical names from unstructured text and linking named entities, which make it possible to create logical links between the extracted names of spatial objects. Their processing, through a local or network database of the service for geocoding, will automate the creation of map layers in a geographic information system based on text messages. The article describes the most popular approaches and their software implementations for solving the problem of extracting named entities in the example of texts of biographies and works of Siberian writers. Rule-based methodologies, maximum entropy models, and convolutional neural networks are analyzed. To assess the quality of the results of extracting geographical names and objects from the text, in addition to the standard F1-score, the authors propose an additional variant of the evaluation method that takes into account a larger number of criteria and is also based on an error matrix. The description of text block markup formats is given to improve the quality of recognition and expand the possible options for geographical names of named entities based on additional training of the neural network model.","PeriodicalId":31498,"journal":{"name":"InterCarto InterGIS","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"InterCarto InterGIS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35595/2414-9179-2022-1-28-659-669","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The current level of development of information technology makes it possible to automate the processing of those types of data that only a specialist could previously work with. One such example is natural language processing technologies that implement the functions of sentiment analysis, machine translation, and question-answer systems. For the processes of creating cartographic and geoinformation works, the methods of extracting named entities are of the greatest interest, which allows extracting geographical names from unstructured text and linking named entities, which make it possible to create logical links between the extracted names of spatial objects. Their processing, through a local or network database of the service for geocoding, will automate the creation of map layers in a geographic information system based on text messages. The article describes the most popular approaches and their software implementations for solving the problem of extracting named entities in the example of texts of biographies and works of Siberian writers. Rule-based methodologies, maximum entropy models, and convolutional neural networks are analyzed. To assess the quality of the results of extracting geographical names and objects from the text, in addition to the standard F1-score, the authors propose an additional variant of the evaluation method that takes into account a larger number of criteria and is also based on an error matrix. The description of text block markup formats is given to improve the quality of recognition and expand the possible options for geographical names of named entities based on additional training of the neural network model.

查看原文本刊更多论文

使用自然语言处理系统进行制图数据准备的自动化

目前信息技术的发展水平使以前只有专家才能处理的那些类型的数据的处理自动化成为可能。其中一个例子是实现情感分析、机器翻译和问答系统功能的自然语言处理技术。对于创建地图和地理信息作品的过程，提取命名实体的方法是最感兴趣的，它允许从非结构化文本中提取地理名称和链接命名实体，这使得在提取的空间对象名称之间创建逻辑链接成为可能。通过地理编码服务的本地或网络数据库对它们进行处理，将在基于文本信息的地理信息系统中自动创建地图层。本文以西伯利亚作家的传记文本和作品为例，介绍了解决命名实体提取问题的最流行的方法及其软件实现。基于规则的方法，最大熵模型和卷积神经网络进行了分析。为了评估从文本中提取地名和物体的结果的质量，除了标准的f1分数外，作者还提出了评估方法的另一种变体，该方法考虑了更多的标准，并且也是基于误差矩阵。本文给出了文本块标记格式的描述，以提高识别质量，并在神经网络模型的额外训练基础上扩展命名实体地理名称的可能选项。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊