Ziyi Chen , Yang Yuan , Sihan Liang , Meng Wan , Kai Li , Weiqi Zhou , Yangang Wang , Zongguo Wang
{"title":"An automatic scientific data collection framework for materials science","authors":"Ziyi Chen , Yang Yuan , Sihan Liang , Meng Wan , Kai Li , Weiqi Zhou , Yangang Wang , Zongguo Wang","doi":"10.1016/j.commatsci.2025.113772","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid development of information technology, there has been an exponential increase in material data. However, challenges such as inconsistencies in data formats and non-standardized storage methods have emerged as primary obstacles for researchers seeking to harness materials science data effectively. To fully exploit material data from diverse sources and achieve the efficient fusion of historical data, this paper introduces a database application framework designed for the automatic collection and analysis of multi-source heterogeneous material data, and two first principles calculations datasets are established. Standardized methods used in this work enable the automatic extraction, storage and analysis of both discrete and database data while also offering an interface for data-driven scientific research. Moreover, this framework used for dataset construction can be deployed in both cloud-based virtual environments and local servers, providing flexibility that not only facilitates data sharing but also ensures data privacy and customized control. The datasets and framework developed in this work offer a robust data foundation and potent tool for researchers engaged in data-driven research.</div></div>","PeriodicalId":10650,"journal":{"name":"Computational Materials Science","volume":"252 ","pages":"Article 113772"},"PeriodicalIF":3.1000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Materials Science","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0927025625001156","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid development of information technology, there has been an exponential increase in material data. However, challenges such as inconsistencies in data formats and non-standardized storage methods have emerged as primary obstacles for researchers seeking to harness materials science data effectively. To fully exploit material data from diverse sources and achieve the efficient fusion of historical data, this paper introduces a database application framework designed for the automatic collection and analysis of multi-source heterogeneous material data, and two first principles calculations datasets are established. Standardized methods used in this work enable the automatic extraction, storage and analysis of both discrete and database data while also offering an interface for data-driven scientific research. Moreover, this framework used for dataset construction can be deployed in both cloud-based virtual environments and local servers, providing flexibility that not only facilitates data sharing but also ensures data privacy and customized control. The datasets and framework developed in this work offer a robust data foundation and potent tool for researchers engaged in data-driven research.
期刊介绍:
The goal of Computational Materials Science is to report on results that provide new or unique insights into, or significantly expand our understanding of, the properties of materials or phenomena associated with their design, synthesis, processing, characterization, and utilization. To be relevant to the journal, the results should be applied or applicable to specific material systems that are discussed within the submission.