Wenjia Li , Xiaogang Ma , Xinqing Wang , Liang Wu , Sanaz Salati , Zhong Xie
{"title":"A hybrid knowledge graph for efficient exploration of lithostratigraphic information in open text data","authors":"Wenjia Li , Xiaogang Ma , Xinqing Wang , Liang Wu , Sanaz Salati , Zhong Xie","doi":"10.1016/j.acags.2024.100164","DOIUrl":null,"url":null,"abstract":"<div><p>Rocks formed during different geologic time record the diverse evolution of the geosphere and biosphere. In the past decades, substantial geoscience data have been made open access, providing invaluable resources for studying the stratigraphy in different regions and at different scales. However, many open datasets have information recorded in natural language with heterogeneous terminologies, short of efficient approaches to analyze them. In this research, we constructed a hybrid Stratigraphic Knowledge Graph (StraKG) to help address this challenge. StraKG has two layers, a simple schema layer and a rich instance layer. For the schemas, we used a short but functional list of classes and relationships, and then incorporated community-recognized terminologies from geological dictionaries. For the instances, we used natural language processing techniques to analyze open text data and obtained massive records, such as rocks and spatial locations. The nodes in the two layers were associated to establish a consistent structure of stratigraphic knowledge. To verify the functionality of StraKG, we applied it to the Baidu encyclopedia, the largest online Chinese encyclopedia. Three experiments were implemented on the topics of stratigraphic correlation, spatial distribution of ophiolite in China, and spatio-temporal distribution of open lithostratigraphic data. The results show that StraKG can provide strong knowledge reference for stratigraphic studies. Used together with data exploration and data mining methods, StraKG illustrates a new approach to analyze the open and big text data in geoscience.</p></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"22 ","pages":"Article 100164"},"PeriodicalIF":2.6000,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2590197424000119/pdfft?md5=f9a7de24734aba4b725f80aef417972d&pid=1-s2.0-S2590197424000119-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590197424000119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Rocks formed during different geologic time record the diverse evolution of the geosphere and biosphere. In the past decades, substantial geoscience data have been made open access, providing invaluable resources for studying the stratigraphy in different regions and at different scales. However, many open datasets have information recorded in natural language with heterogeneous terminologies, short of efficient approaches to analyze them. In this research, we constructed a hybrid Stratigraphic Knowledge Graph (StraKG) to help address this challenge. StraKG has two layers, a simple schema layer and a rich instance layer. For the schemas, we used a short but functional list of classes and relationships, and then incorporated community-recognized terminologies from geological dictionaries. For the instances, we used natural language processing techniques to analyze open text data and obtained massive records, such as rocks and spatial locations. The nodes in the two layers were associated to establish a consistent structure of stratigraphic knowledge. To verify the functionality of StraKG, we applied it to the Baidu encyclopedia, the largest online Chinese encyclopedia. Three experiments were implemented on the topics of stratigraphic correlation, spatial distribution of ophiolite in China, and spatio-temporal distribution of open lithostratigraphic data. The results show that StraKG can provide strong knowledge reference for stratigraphic studies. Used together with data exploration and data mining methods, StraKG illustrates a new approach to analyze the open and big text data in geoscience.