Kristin Stock, K. Wijegunarathna, C. B. Jones, H. Morris, Pragyan Das, D. Medyckyj-Scott, Brandon Whitehead
{"title":"The BioWhere Project: Unlocking the Potential of Biological Collections Data","authors":"Kristin Stock, K. Wijegunarathna, C. B. Jones, H. Morris, Pragyan Das, D. Medyckyj-Scott, Brandon Whitehead","doi":"10.1553/giscience2023_01_s3","DOIUrl":null,"url":null,"abstract":"Vast numbers of biological specimens (e.g. flora, fauna, soils) are stored in collections globally. Many of these have only a natural-language location description, such as ‘ 200ft above and south of main highway, 1.1 miles west of Porters Pass ’, and numerical coordinates are unknown. The BioWhere project is pioneering methods to automatically determine the geographic coordinates (georeferences) of complex location descriptions. Particular challenges are posed by the variable accuracy of recent and historical data that might be used to train models to predict geographic coordinates from the natural-language descriptions; by the presence of historical place names in the descriptions that are not stored in existing gazetteers; and by the vague and context-sensitive nature (e.g. above , on , south of ) of the descriptions. We are addressing these challenges by extending the latest transformer-based deep learning models to parse locality descriptions, and to build models for specific spatial terms that incorporate geographic context and data quality to more accurately predict georeferences. We also describe a gazetteer that contains enriched cultural content to support georeferencing of historical records, and to serve as a store of New Zealand Māori cultural knowledge for future generations.","PeriodicalId":29645,"journal":{"name":"GI_Forum","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GI_Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1553/giscience2023_01_s3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
Vast numbers of biological specimens (e.g. flora, fauna, soils) are stored in collections globally. Many of these have only a natural-language location description, such as ‘ 200ft above and south of main highway, 1.1 miles west of Porters Pass ’, and numerical coordinates are unknown. The BioWhere project is pioneering methods to automatically determine the geographic coordinates (georeferences) of complex location descriptions. Particular challenges are posed by the variable accuracy of recent and historical data that might be used to train models to predict geographic coordinates from the natural-language descriptions; by the presence of historical place names in the descriptions that are not stored in existing gazetteers; and by the vague and context-sensitive nature (e.g. above , on , south of ) of the descriptions. We are addressing these challenges by extending the latest transformer-based deep learning models to parse locality descriptions, and to build models for specific spatial terms that incorporate geographic context and data quality to more accurately predict georeferences. We also describe a gazetteer that contains enriched cultural content to support georeferencing of historical records, and to serve as a store of New Zealand Māori cultural knowledge for future generations.
大量的生物标本(如植物、动物、土壤)储存在全球各地。其中许多只有自然语言的位置描述,比如“在主干道以南200英尺处,波特斯山口以西1.1英里处”,数字坐标是未知的。BioWhere项目是自动确定复杂位置描述的地理坐标(地理参考)方法的先驱。近期和历史数据的不同准确性带来了特殊的挑战,这些数据可能用于训练模型,以从自然语言描述中预测地理坐标;通过在现有地名辞典中没有存储的描述中出现历史地名;并且通过描述的模糊和上下文敏感的性质(例如above, on, south of)。我们正在通过扩展最新的基于转换器的深度学习模型来解决这些挑战,以解析位置描述,并为包含地理背景和数据质量的特定空间术语构建模型,以更准确地预测地理参考。我们还描述了一个包含丰富文化内容的地名辞典,以支持历史记录的地理参考,并为后代提供新西兰Māori文化知识的存储。