Deep learning-based mineral exploration named entity recognition: A case study of granitic pegmatite-type lithium deposits

IF 3.2 2区 地球科学 Q1 GEOLOGY
Jintao Tao , Nannan Zhang , Jinyu Chang , Li Chen , Hao Zhang , Shibin Liao , Siyuan Li
{"title":"Deep learning-based mineral exploration named entity recognition: A case study of granitic pegmatite-type lithium deposits","authors":"Jintao Tao ,&nbsp;Nannan Zhang ,&nbsp;Jinyu Chang ,&nbsp;Li Chen ,&nbsp;Hao Zhang ,&nbsp;Shibin Liao ,&nbsp;Siyuan Li","doi":"10.1016/j.oregeorev.2024.106367","DOIUrl":null,"url":null,"abstract":"<div><div>Geological text data play a crucial role as sources of geological information and knowledge for mineral exploration. Mineral exploration involves predicting and detecting mineral resources using geological, geochemical, geophysical, and remote sensing data. However, existing named entity recognition studies on mineral deposits have mainly focused on geological environments and mineral deposit models, which are insufficient for capturing the extensive knowledge essential for mineral exploration and supporting subsequent exploration efforts. This paper presents an efficient workflow for automatically extracting mineral exploration information from unstructured geological text data using a deep learning method. Initially, 21 entity types were identified based on a conceptual prospecting model of granitic pegmatite-type lithium deposits. A mineral exploration corpus was constructed from Chinese geological literature and reports, comprising 3,386 sentences and 13,167 entities. Subsequently, a Mineral Exploration Named Entity Recognition (MENER) model is proposed to extract mineral exploration information. This model integrates entity-type enhanced characters, words, and contextual features to enhance the performance. Bidirectional encoder representations from the transformer model were employed to obtain character embeddings of the input text. Mineral exploration entity types provide external knowledge, aiding the understanding of entity semantics within sentences through multi-head attention. Convolutional neural networks and bidirectional long short-term memory models have been employed to extract word and contextual features and capture additional structural information. Geological entity nomenclature and expressions follow certain default conventions and paradigms. A boundary prediction classifier was introduced to identify the head and tail characteristics of geological entities. A conditional random field was then utilized to classify the entities. The MENER model achieved an average F1-score of 79.69% on the constructed dataset. Finally, a geological document was selected as a case study to demonstrate the effectiveness of the proposed model. The workflow outlined in this study enables the rapid and robust extraction of specific information and knowledge mining from geological text data, with potential applications across various domains.</div></div>","PeriodicalId":19644,"journal":{"name":"Ore Geology Reviews","volume":"175 ","pages":"Article 106367"},"PeriodicalIF":3.2000,"publicationDate":"2024-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ore Geology Reviews","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169136824005006","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Geological text data play a crucial role as sources of geological information and knowledge for mineral exploration. Mineral exploration involves predicting and detecting mineral resources using geological, geochemical, geophysical, and remote sensing data. However, existing named entity recognition studies on mineral deposits have mainly focused on geological environments and mineral deposit models, which are insufficient for capturing the extensive knowledge essential for mineral exploration and supporting subsequent exploration efforts. This paper presents an efficient workflow for automatically extracting mineral exploration information from unstructured geological text data using a deep learning method. Initially, 21 entity types were identified based on a conceptual prospecting model of granitic pegmatite-type lithium deposits. A mineral exploration corpus was constructed from Chinese geological literature and reports, comprising 3,386 sentences and 13,167 entities. Subsequently, a Mineral Exploration Named Entity Recognition (MENER) model is proposed to extract mineral exploration information. This model integrates entity-type enhanced characters, words, and contextual features to enhance the performance. Bidirectional encoder representations from the transformer model were employed to obtain character embeddings of the input text. Mineral exploration entity types provide external knowledge, aiding the understanding of entity semantics within sentences through multi-head attention. Convolutional neural networks and bidirectional long short-term memory models have been employed to extract word and contextual features and capture additional structural information. Geological entity nomenclature and expressions follow certain default conventions and paradigms. A boundary prediction classifier was introduced to identify the head and tail characteristics of geological entities. A conditional random field was then utilized to classify the entities. The MENER model achieved an average F1-score of 79.69% on the constructed dataset. Finally, a geological document was selected as a case study to demonstrate the effectiveness of the proposed model. The workflow outlined in this study enables the rapid and robust extraction of specific information and knowledge mining from geological text data, with potential applications across various domains.

Abstract Image

基于深度学习的矿物勘探命名实体识别:花岗伟晶岩型锂矿床案例研究
地质文本数据作为矿产勘探的地质信息和知识来源,发挥着至关重要的作用。矿产勘探包括利用地质、地球化学、地球物理和遥感数据预测和探测矿产资源。然而,现有的矿床命名实体识别研究主要集中在地质环境和矿床模型上,不足以捕捉矿产勘探所必需的大量知识并为后续勘探工作提供支持。本文介绍了一种利用深度学习方法从非结构化地质文本数据中自动提取矿产勘探信息的高效工作流程。最初,根据花岗伟晶岩型锂矿床的概念勘探模型确定了 21 种实体类型。从中国地质文献和报告中构建了矿产勘查语料库,包括 3,386 个句子和 13,167 个实体。随后,提出了一个矿产勘探命名实体识别(MENER)模型来提取矿产勘探信息。该模型整合了实体类型增强字符、单词和上下文特征,以提高性能。转换器模型中的双向编码器表示被用来获取输入文本的字符嵌入。矿产勘探实体类型提供了外部知识,通过多头注意力帮助理解句子中的实体语义。卷积神经网络和双向长短期记忆模型被用来提取单词和上下文特征,并捕捉额外的结构信息。地质实体命名和表达遵循一定的默认惯例和范式。我们引入了一个边界预测分类器来识别地质实体的头部和尾部特征。然后利用条件随机场对实体进行分类。MENER 模型在构建的数据集上取得了 79.69% 的平均 F1 分数。最后,我们选择了一份地质文件作为案例研究,以证明所提模型的有效性。本研究中概述的工作流程能够从地质文本数据中快速、稳健地提取特定信息并进行知识挖掘,具有跨领域应用的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ore Geology Reviews
Ore Geology Reviews 地学-地质学
CiteScore
6.50
自引率
27.30%
发文量
546
审稿时长
22.9 weeks
期刊介绍: Ore Geology Reviews aims to familiarize all earth scientists with recent advances in a number of interconnected disciplines related to the study of, and search for, ore deposits. The reviews range from brief to longer contributions, but the journal preferentially publishes manuscripts that fill the niche between the commonly shorter journal articles and the comprehensive book coverages, and thus has a special appeal to many authors and readers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信