{"title":"MRMinerals and MineralTD:用于数据驱动研究的机器可读矿物公式和成分数据集","authors":"Tamanna, Dominik C. Hezel, Horst R. Marschall","doi":"10.1002/gdj3.70036","DOIUrl":null,"url":null,"abstract":"<p>Artificial intelligence (AI) is being increasingly applied in the geosciences, particularly in fields like mineralogy, where it supports tasks such as mineral classification, automated thin-section image analysis, or mineral exploration targeting. Such tasks require large structured and standardized data sets, which are currently not available. We build two databases to fill this gap: (i) <i>MRMinerals</i> contains a list of the 400 most common and geologically significant minerals, including major rock-forming minerals, key accessory minerals, and economically important ore minerals with machine-readable formulas as the key feature. (ii) <i>MineralTD</i> contains a large training data set with 10,000+ compositions for each of the 400 minerals in MRMinerals. MineralTD is split into two subdatasets: <i>MineralTDMeasured</i> and <i>MineralTDSynthetic</i>. MineralTDMeasured contains approximately 140,000 mineral compositions from the open-access geochemical databases and repositories GEOROC, Pangaea, PetDB, RRUFF, and ESMD. MineralTDSynthetic contains synthetic mineral compositions, generated using machine-readable formulas from MRMinerals, with at least 10,000 compositions per mineral. MineralTD is annotated with metadata, such as mineral frequency, rock classification, data source, and methods used to provide a full understanding of the individual data set. The MRMinerals and MineralTD are ready-to-use open access data sets that enable scalable, data-driven research in mineralogy, e.g., ML applications.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"12 4","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.70036","citationCount":"0","resultStr":"{\"title\":\"MRMinerals and MineralTD: Machine-Readable Mineral Formula and Compositions Data Set for Data-Driven Research\",\"authors\":\"Tamanna, Dominik C. Hezel, Horst R. Marschall\",\"doi\":\"10.1002/gdj3.70036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Artificial intelligence (AI) is being increasingly applied in the geosciences, particularly in fields like mineralogy, where it supports tasks such as mineral classification, automated thin-section image analysis, or mineral exploration targeting. Such tasks require large structured and standardized data sets, which are currently not available. We build two databases to fill this gap: (i) <i>MRMinerals</i> contains a list of the 400 most common and geologically significant minerals, including major rock-forming minerals, key accessory minerals, and economically important ore minerals with machine-readable formulas as the key feature. (ii) <i>MineralTD</i> contains a large training data set with 10,000+ compositions for each of the 400 minerals in MRMinerals. MineralTD is split into two subdatasets: <i>MineralTDMeasured</i> and <i>MineralTDSynthetic</i>. MineralTDMeasured contains approximately 140,000 mineral compositions from the open-access geochemical databases and repositories GEOROC, Pangaea, PetDB, RRUFF, and ESMD. MineralTDSynthetic contains synthetic mineral compositions, generated using machine-readable formulas from MRMinerals, with at least 10,000 compositions per mineral. MineralTD is annotated with metadata, such as mineral frequency, rock classification, data source, and methods used to provide a full understanding of the individual data set. The MRMinerals and MineralTD are ready-to-use open access data sets that enable scalable, data-driven research in mineralogy, e.g., ML applications.</p>\",\"PeriodicalId\":54351,\"journal\":{\"name\":\"Geoscience Data Journal\",\"volume\":\"12 4\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://rmets.onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.70036\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geoscience Data Journal\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://rmets.onlinelibrary.wiley.com/doi/10.1002/gdj3.70036\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GEOSCIENCES, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoscience Data Journal","FirstCategoryId":"89","ListUrlMain":"https://rmets.onlinelibrary.wiley.com/doi/10.1002/gdj3.70036","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
MRMinerals and MineralTD: Machine-Readable Mineral Formula and Compositions Data Set for Data-Driven Research
Artificial intelligence (AI) is being increasingly applied in the geosciences, particularly in fields like mineralogy, where it supports tasks such as mineral classification, automated thin-section image analysis, or mineral exploration targeting. Such tasks require large structured and standardized data sets, which are currently not available. We build two databases to fill this gap: (i) MRMinerals contains a list of the 400 most common and geologically significant minerals, including major rock-forming minerals, key accessory minerals, and economically important ore minerals with machine-readable formulas as the key feature. (ii) MineralTD contains a large training data set with 10,000+ compositions for each of the 400 minerals in MRMinerals. MineralTD is split into two subdatasets: MineralTDMeasured and MineralTDSynthetic. MineralTDMeasured contains approximately 140,000 mineral compositions from the open-access geochemical databases and repositories GEOROC, Pangaea, PetDB, RRUFF, and ESMD. MineralTDSynthetic contains synthetic mineral compositions, generated using machine-readable formulas from MRMinerals, with at least 10,000 compositions per mineral. MineralTD is annotated with metadata, such as mineral frequency, rock classification, data source, and methods used to provide a full understanding of the individual data set. The MRMinerals and MineralTD are ready-to-use open access data sets that enable scalable, data-driven research in mineralogy, e.g., ML applications.
Geoscience Data JournalGEOSCIENCES, MULTIDISCIPLINARYMETEOROLOGY-METEOROLOGY & ATMOSPHERIC SCIENCES
CiteScore
5.90
自引率
9.40%
发文量
35
审稿时长
4 weeks
期刊介绍:
Geoscience Data Journal provides an Open Access platform where scientific data can be formally published, in a way that includes scientific peer-review. Thus the dataset creator attains full credit for their efforts, while also improving the scientific record, providing version control for the community and allowing major datasets to be fully described, cited and discovered.
An online-only journal, GDJ publishes short data papers cross-linked to – and citing – datasets that have been deposited in approved data centres and awarded DOIs. The journal will also accept articles on data services, and articles which support and inform data publishing best practices.
Data is at the heart of science and scientific endeavour. The curation of data and the science associated with it is as important as ever in our understanding of the changing earth system and thereby enabling us to make future predictions. Geoscience Data Journal is working with recognised Data Centres across the globe to develop the future strategy for data publication, the recognition of the value of data and the communication and exploitation of data to the wider science and stakeholder communities.