一种地质知识约束实体及文本关系提取方法——以花岗伟晶岩型锂矿床为例

IF 4.2 2区 地球科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Jintao Tao , Nannan Zhang , Jinyu Chang , Li Chen , Hao Zhang , Shibin Liao , Siyuan Li , Jianpeng Jing
{"title":"一种地质知识约束实体及文本关系提取方法——以花岗伟晶岩型锂矿床为例","authors":"Jintao Tao ,&nbsp;Nannan Zhang ,&nbsp;Jinyu Chang ,&nbsp;Li Chen ,&nbsp;Hao Zhang ,&nbsp;Shibin Liao ,&nbsp;Siyuan Li ,&nbsp;Jianpeng Jing","doi":"10.1016/j.cageo.2025.105920","DOIUrl":null,"url":null,"abstract":"<div><div>Geological text data contain rich and valuable information about geological environments and mineral deposits. The automated extraction of geological information from these unstructured texts is crucial for constructing geological knowledge graphs and facilitating knowledge discovery. Numerous studies have introduced methods for geological entity and relation extraction from different perspectives. Although many of these studies effectively utilize geological ontologies or schemas for data labeling, fewer have explicitly examined how these frameworks can constrain and improve the information extraction process. In this study, we propose a Geological Knowledge-constrained Entity and Relation Extraction (GKERE) method that incorporates a geological schema to enhance the extraction process. The GKERE method uses the Robustly Optimize Bidirectional Encoder Representation from Transformers Pre-training Approach to generate character embeddings from geological sentences. It begins with a span-based named entity recognition model to identify entities, then generates entity pairs and predicts their relationships using the geological schema. The schema helps filter out redundant entity pairs and provides information about the types of head/tail entities and their possible relationships, guiding the relation extraction step. To validate the method, we conducted a case study on granitic pegmatite-type lithium deposits. A geological schema was designed, comprising 22 entity types, 16 relationships, and 184 knowledge rules. An entity-relation extraction dataset was then constructed using 68 geological journal articles and four mineral exploration reports. The proposed GKERE method achieves an impressive F1-score of 75.82 % on this dataset, outperforming several baseline models. Results show that the GKERE method significantly enhances geological entity and relation extraction. The introduction of the geological schema not only accelerates computation but also improves model accuracy, making this approach effective for extracting geological information from large-scale textual data and promoting geological knowledge discovery.</div></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":"200 ","pages":"Article 105920"},"PeriodicalIF":4.2000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A geological knowledge-constrained entity and relation extraction method for text: A case study of granitic pegmatite-type lithium deposits\",\"authors\":\"Jintao Tao ,&nbsp;Nannan Zhang ,&nbsp;Jinyu Chang ,&nbsp;Li Chen ,&nbsp;Hao Zhang ,&nbsp;Shibin Liao ,&nbsp;Siyuan Li ,&nbsp;Jianpeng Jing\",\"doi\":\"10.1016/j.cageo.2025.105920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Geological text data contain rich and valuable information about geological environments and mineral deposits. The automated extraction of geological information from these unstructured texts is crucial for constructing geological knowledge graphs and facilitating knowledge discovery. Numerous studies have introduced methods for geological entity and relation extraction from different perspectives. Although many of these studies effectively utilize geological ontologies or schemas for data labeling, fewer have explicitly examined how these frameworks can constrain and improve the information extraction process. In this study, we propose a Geological Knowledge-constrained Entity and Relation Extraction (GKERE) method that incorporates a geological schema to enhance the extraction process. The GKERE method uses the Robustly Optimize Bidirectional Encoder Representation from Transformers Pre-training Approach to generate character embeddings from geological sentences. It begins with a span-based named entity recognition model to identify entities, then generates entity pairs and predicts their relationships using the geological schema. The schema helps filter out redundant entity pairs and provides information about the types of head/tail entities and their possible relationships, guiding the relation extraction step. To validate the method, we conducted a case study on granitic pegmatite-type lithium deposits. A geological schema was designed, comprising 22 entity types, 16 relationships, and 184 knowledge rules. An entity-relation extraction dataset was then constructed using 68 geological journal articles and four mineral exploration reports. The proposed GKERE method achieves an impressive F1-score of 75.82 % on this dataset, outperforming several baseline models. Results show that the GKERE method significantly enhances geological entity and relation extraction. The introduction of the geological schema not only accelerates computation but also improves model accuracy, making this approach effective for extracting geological information from large-scale textual data and promoting geological knowledge discovery.</div></div>\",\"PeriodicalId\":55221,\"journal\":{\"name\":\"Computers & Geosciences\",\"volume\":\"200 \",\"pages\":\"Article 105920\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2025-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Geosciences\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0098300425000706\",\"RegionNum\":2,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300425000706","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

地质文本数据包含有关地质环境和矿床的丰富而有价值的信息。从这些非结构化文本中自动提取地质信息对于构建地质知识图谱和促进知识发现至关重要。大量研究从不同角度介绍了地质实体及其关系提取方法。尽管这些研究中有许多有效地利用地质本体或模式进行数据标记,但很少有人明确地研究这些框架如何约束和改进信息提取过程。在本研究中,我们提出了一种基于地质知识约束的实体与关系提取方法(GKERE),该方法结合地质模式来提高提取过程。GKERE方法使用来自变压器预训练方法的鲁棒优化双向编码器表示,从地质句子中生成字符嵌入。它首先使用基于跨度的命名实体识别模型来识别实体,然后使用地质模式生成实体对并预测它们之间的关系。模式帮助过滤冗余的实体对,并提供有关头尾实体类型及其可能关系的信息,指导关系提取步骤。为了验证该方法,我们对花岗岩伟晶岩型锂矿床进行了案例研究。设计了一个包含22种实体类型、16种关系和184条知识规则的地质模式。然后利用68篇地质期刊文章和4份矿产勘探报告构建实体关系提取数据集。提出的GKERE方法在该数据集上取得了令人印象深刻的f1得分75.82%,优于几个基线模型。结果表明,GKERE方法显著提高了地质实体和地质关系的提取效果。地质图式的引入不仅加快了计算速度,而且提高了模型精度,使该方法能够有效地从大规模文本数据中提取地质信息,促进地质知识的发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A geological knowledge-constrained entity and relation extraction method for text: A case study of granitic pegmatite-type lithium deposits
Geological text data contain rich and valuable information about geological environments and mineral deposits. The automated extraction of geological information from these unstructured texts is crucial for constructing geological knowledge graphs and facilitating knowledge discovery. Numerous studies have introduced methods for geological entity and relation extraction from different perspectives. Although many of these studies effectively utilize geological ontologies or schemas for data labeling, fewer have explicitly examined how these frameworks can constrain and improve the information extraction process. In this study, we propose a Geological Knowledge-constrained Entity and Relation Extraction (GKERE) method that incorporates a geological schema to enhance the extraction process. The GKERE method uses the Robustly Optimize Bidirectional Encoder Representation from Transformers Pre-training Approach to generate character embeddings from geological sentences. It begins with a span-based named entity recognition model to identify entities, then generates entity pairs and predicts their relationships using the geological schema. The schema helps filter out redundant entity pairs and provides information about the types of head/tail entities and their possible relationships, guiding the relation extraction step. To validate the method, we conducted a case study on granitic pegmatite-type lithium deposits. A geological schema was designed, comprising 22 entity types, 16 relationships, and 184 knowledge rules. An entity-relation extraction dataset was then constructed using 68 geological journal articles and four mineral exploration reports. The proposed GKERE method achieves an impressive F1-score of 75.82 % on this dataset, outperforming several baseline models. Results show that the GKERE method significantly enhances geological entity and relation extraction. The introduction of the geological schema not only accelerates computation but also improves model accuracy, making this approach effective for extracting geological information from large-scale textual data and promoting geological knowledge discovery.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Geosciences
Computers & Geosciences 地学-地球科学综合
CiteScore
9.30
自引率
6.80%
发文量
164
审稿时长
3.4 months
期刊介绍: Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信