基于转换器的命名实体识别,用于从非结构化文本中提取地名

IF 4.3 1区 地球科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Cillian Berragan, A. Singleton, A. Calafiore, J. Morley
{"title":"基于转换器的命名实体识别,用于从非结构化文本中提取地名","authors":"Cillian Berragan, A. Singleton, A. Calafiore, J. Morley","doi":"10.1080/13658816.2022.2133125","DOIUrl":null,"url":null,"abstract":"Abstract Place names embedded in online natural language text present a useful source of geographic information. Despite this, many methods for the extraction of place names from text use pre-trained models that were not explicitly designed for this task. Our paper builds five custom-built Named Entity Recognition (NER) models and evaluates them against three popular pre-built models for place name extraction. The models are evaluated using a set of manually annotated Wikipedia articles with reference to the F1 score metric. Our best performing model achieves an F1 score of 0.939 compared with 0.730 for the best performing pre-built model. Our model is then used to extract all place names from Wikipedia articles in Great Britain, demonstrating the ability to more accurately capture unknown place names from volunteered sources of online geographic information.","PeriodicalId":14162,"journal":{"name":"International Journal of Geographical Information Science","volume":"37 1","pages":"747 - 766"},"PeriodicalIF":4.3000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Transformer based named entity recognition for place name extraction from unstructured text\",\"authors\":\"Cillian Berragan, A. Singleton, A. Calafiore, J. Morley\",\"doi\":\"10.1080/13658816.2022.2133125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Place names embedded in online natural language text present a useful source of geographic information. Despite this, many methods for the extraction of place names from text use pre-trained models that were not explicitly designed for this task. Our paper builds five custom-built Named Entity Recognition (NER) models and evaluates them against three popular pre-built models for place name extraction. The models are evaluated using a set of manually annotated Wikipedia articles with reference to the F1 score metric. Our best performing model achieves an F1 score of 0.939 compared with 0.730 for the best performing pre-built model. Our model is then used to extract all place names from Wikipedia articles in Great Britain, demonstrating the ability to more accurately capture unknown place names from volunteered sources of online geographic information.\",\"PeriodicalId\":14162,\"journal\":{\"name\":\"International Journal of Geographical Information Science\",\"volume\":\"37 1\",\"pages\":\"747 - 766\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2022-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Geographical Information Science\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://doi.org/10.1080/13658816.2022.2133125\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Geographical Information Science","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1080/13658816.2022.2133125","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 9

摘要

在线自然语言文本中嵌入的地名是一种有用的地理信息来源。尽管如此,许多从文本中提取地名的方法使用的是预先训练过的模型,而这些模型并不是为这项任务明确设计的。本文构建了五个定制的命名实体识别(NER)模型,并将它们与三个流行的预先构建的地名提取模型进行了比较。使用一组参考F1评分指标的手动注释的Wikipedia文章来评估这些模型。我们表现最好的模型F1得分为0.939,而表现最好的预建模型F1得分为0.730。然后,我们的模型用于从维基百科文章中提取英国的所有地名,证明了从自愿提供的在线地理信息来源中更准确地捕获未知地名的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Transformer based named entity recognition for place name extraction from unstructured text
Abstract Place names embedded in online natural language text present a useful source of geographic information. Despite this, many methods for the extraction of place names from text use pre-trained models that were not explicitly designed for this task. Our paper builds five custom-built Named Entity Recognition (NER) models and evaluates them against three popular pre-built models for place name extraction. The models are evaluated using a set of manually annotated Wikipedia articles with reference to the F1 score metric. Our best performing model achieves an F1 score of 0.939 compared with 0.730 for the best performing pre-built model. Our model is then used to extract all place names from Wikipedia articles in Great Britain, demonstrating the ability to more accurately capture unknown place names from volunteered sources of online geographic information.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
11.00
自引率
7.00%
发文量
81
审稿时长
9 months
期刊介绍: International Journal of Geographical Information Science provides a forum for the exchange of original ideas, approaches, methods and experiences in the rapidly growing field of geographical information science (GIScience). It is intended to interest those who research fundamental and computational issues of geographic information, as well as issues related to the design, implementation and use of geographical information for monitoring, prediction and decision making. Published research covers innovations in GIScience and novel applications of GIScience in natural resources, social systems and the built environment, as well as relevant developments in computer science, cartography, surveying, geography and engineering in both developed and developing countries.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信