MRMinerals and MineralTD:用于数据驱动研究的机器可读矿物公式和成分数据集

IF 2.4 3区 地球科学 Q2 GEOSCIENCES, MULTIDISCIPLINARY
Tamanna, Dominik C. Hezel, Horst R. Marschall
{"title":"MRMinerals and MineralTD:用于数据驱动研究的机器可读矿物公式和成分数据集","authors":"Tamanna,&nbsp;Dominik C. Hezel,&nbsp;Horst R. Marschall","doi":"10.1002/gdj3.70036","DOIUrl":null,"url":null,"abstract":"<p>Artificial intelligence (AI) is being increasingly applied in the geosciences, particularly in fields like mineralogy, where it supports tasks such as mineral classification, automated thin-section image analysis, or mineral exploration targeting. Such tasks require large structured and standardized data sets, which are currently not available. We build two databases to fill this gap: (i) <i>MRMinerals</i> contains a list of the 400 most common and geologically significant minerals, including major rock-forming minerals, key accessory minerals, and economically important ore minerals with machine-readable formulas as the key feature. (ii) <i>MineralTD</i> contains a large training data set with 10,000+ compositions for each of the 400 minerals in MRMinerals. MineralTD is split into two subdatasets: <i>MineralTDMeasured</i> and <i>MineralTDSynthetic</i>. MineralTDMeasured contains approximately 140,000 mineral compositions from the open-access geochemical databases and repositories GEOROC, Pangaea, PetDB, RRUFF, and ESMD. MineralTDSynthetic contains synthetic mineral compositions, generated using machine-readable formulas from MRMinerals, with at least 10,000 compositions per mineral. MineralTD is annotated with metadata, such as mineral frequency, rock classification, data source, and methods used to provide a full understanding of the individual data set. The MRMinerals and MineralTD are ready-to-use open access data sets that enable scalable, data-driven research in mineralogy, e.g., ML applications.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"12 4","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.70036","citationCount":"0","resultStr":"{\"title\":\"MRMinerals and MineralTD: Machine-Readable Mineral Formula and Compositions Data Set for Data-Driven Research\",\"authors\":\"Tamanna,&nbsp;Dominik C. Hezel,&nbsp;Horst R. Marschall\",\"doi\":\"10.1002/gdj3.70036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Artificial intelligence (AI) is being increasingly applied in the geosciences, particularly in fields like mineralogy, where it supports tasks such as mineral classification, automated thin-section image analysis, or mineral exploration targeting. Such tasks require large structured and standardized data sets, which are currently not available. We build two databases to fill this gap: (i) <i>MRMinerals</i> contains a list of the 400 most common and geologically significant minerals, including major rock-forming minerals, key accessory minerals, and economically important ore minerals with machine-readable formulas as the key feature. (ii) <i>MineralTD</i> contains a large training data set with 10,000+ compositions for each of the 400 minerals in MRMinerals. MineralTD is split into two subdatasets: <i>MineralTDMeasured</i> and <i>MineralTDSynthetic</i>. MineralTDMeasured contains approximately 140,000 mineral compositions from the open-access geochemical databases and repositories GEOROC, Pangaea, PetDB, RRUFF, and ESMD. MineralTDSynthetic contains synthetic mineral compositions, generated using machine-readable formulas from MRMinerals, with at least 10,000 compositions per mineral. MineralTD is annotated with metadata, such as mineral frequency, rock classification, data source, and methods used to provide a full understanding of the individual data set. The MRMinerals and MineralTD are ready-to-use open access data sets that enable scalable, data-driven research in mineralogy, e.g., ML applications.</p>\",\"PeriodicalId\":54351,\"journal\":{\"name\":\"Geoscience Data Journal\",\"volume\":\"12 4\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.70036\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geoscience Data Journal\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://rmets.onlinelibrary.wiley.com/doi/10.1002/gdj3.70036\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GEOSCIENCES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoscience Data Journal","FirstCategoryId":"89","ListUrlMain":"https://rmets.onlinelibrary.wiley.com/doi/10.1002/gdj3.70036","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

人工智能(AI)在地球科学领域的应用越来越多,特别是在矿物学等领域,它支持矿物分类、自动薄切片图像分析或矿物勘探定位等任务。这些任务需要大量的结构化和标准化数据集,而这些数据集目前还无法获得。我们建立了两个数据库来填补这一空白:(i) MRMinerals包含400种最常见和地质上重要的矿物的列表,包括主要的造岩矿物,关键的辅助矿物和经济上重要的矿石矿物,并以机器可读的公式为关键特征。(ii) MineralTD包含一个大型训练数据集,其中包含MRMinerals中400种矿物中的每种矿物的10,000多种成分。MineralTD分为两个子数据集:MineralTDMeasured和MineralTDSynthetic。MineralTDMeasured包含了大约14万种矿物成分,这些成分来自开放访问的地球化学数据库和存储库GEOROC、Pangaea、PetDB、RRUFF和ESMD。MineralTDSynthetic包含合成矿物成分,使用MRMinerals的机器可读公式生成,每种矿物至少有10,000种成分。MineralTD带有元数据注释,例如矿物频率、岩石分类、数据源以及用于提供对单个数据集的全面理解的方法。MRMinerals和MineralTD是随时可用的开放访问数据集,可以在矿物学中进行可扩展的数据驱动研究,例如ML应用程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

MRMinerals and MineralTD: Machine-Readable Mineral Formula and Compositions Data Set for Data-Driven Research

MRMinerals and MineralTD: Machine-Readable Mineral Formula and Compositions Data Set for Data-Driven Research

Artificial intelligence (AI) is being increasingly applied in the geosciences, particularly in fields like mineralogy, where it supports tasks such as mineral classification, automated thin-section image analysis, or mineral exploration targeting. Such tasks require large structured and standardized data sets, which are currently not available. We build two databases to fill this gap: (i) MRMinerals contains a list of the 400 most common and geologically significant minerals, including major rock-forming minerals, key accessory minerals, and economically important ore minerals with machine-readable formulas as the key feature. (ii) MineralTD contains a large training data set with 10,000+ compositions for each of the 400 minerals in MRMinerals. MineralTD is split into two subdatasets: MineralTDMeasured and MineralTDSynthetic. MineralTDMeasured contains approximately 140,000 mineral compositions from the open-access geochemical databases and repositories GEOROC, Pangaea, PetDB, RRUFF, and ESMD. MineralTDSynthetic contains synthetic mineral compositions, generated using machine-readable formulas from MRMinerals, with at least 10,000 compositions per mineral. MineralTD is annotated with metadata, such as mineral frequency, rock classification, data source, and methods used to provide a full understanding of the individual data set. The MRMinerals and MineralTD are ready-to-use open access data sets that enable scalable, data-driven research in mineralogy, e.g., ML applications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Geoscience Data Journal
Geoscience Data Journal GEOSCIENCES, MULTIDISCIPLINARYMETEOROLOGY-METEOROLOGY & ATMOSPHERIC SCIENCES
CiteScore
5.90
自引率
9.40%
发文量
35
审稿时长
4 weeks
期刊介绍: Geoscience Data Journal provides an Open Access platform where scientific data can be formally published, in a way that includes scientific peer-review. Thus the dataset creator attains full credit for their efforts, while also improving the scientific record, providing version control for the community and allowing major datasets to be fully described, cited and discovered. An online-only journal, GDJ publishes short data papers cross-linked to – and citing – datasets that have been deposited in approved data centres and awarded DOIs. The journal will also accept articles on data services, and articles which support and inform data publishing best practices. Data is at the heart of science and scientific endeavour. The curation of data and the science associated with it is as important as ever in our understanding of the changing earth system and thereby enabling us to make future predictions. Geoscience Data Journal is working with recognised Data Centres across the globe to develop the future strategy for data publication, the recognition of the value of data and the communication and exploitation of data to the wider science and stakeholder communities.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信