Cuong Ly , William Frazier , Adam Olsen , Ian Schwerdt , Luther W. McDonald IV , Alex Hagen
{"title":"Improving microstructures segmentation via pretraining with synthetic data","authors":"Cuong Ly , William Frazier , Adam Olsen , Ian Schwerdt , Luther W. McDonald IV , Alex Hagen","doi":"10.1016/j.commatsci.2024.113639","DOIUrl":null,"url":null,"abstract":"<div><div>Image analysis of material microstructures through microscopy is an integral capability in the field of materials science. The topological and chemical information obtained through microscopy allow us to draw vital connections between material microstructures, properties, and processing. While scanning electron microscopy (SEM) image is able to yield a considerable wealth of information interpretable by the intuition of experts, there has been significant amount of interests in using machine learning, convolutional neural networks (CNNs) in particular, for such image analysis task. Training CNNs for an image analysis task requires a large annotated dataset. However, in many materials science applications, obtaining a large annotated dataset is cost and labor intensive. In this work, we study the use of synthetic data to enlarge the available annotated experimental data of uranium oxide. We utilize a modified Potts model to simulate uranium oxide particles with morphologies similar to those observed experimentally. We then leverage an image-to-image translation model to synthesize the simulated particles as if they are acquired with SEM. Through this process, we obtain pairs of particle images and their corresponding SEM representations, which corresponds to pairs of annotations and images. Unlike previous works, we leverage synthetic data for pretraining a CNN model prior, and finetune that model further with experimental data. We experimentally demonstrate that using synthetic data as an incremental learning process benefits the overall performance compared to training a model on combined synthetic and experimental data.</div></div>","PeriodicalId":10650,"journal":{"name":"Computational Materials Science","volume":"249 ","pages":"Article 113639"},"PeriodicalIF":3.1000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Materials Science","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0927025624008607","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Image analysis of material microstructures through microscopy is an integral capability in the field of materials science. The topological and chemical information obtained through microscopy allow us to draw vital connections between material microstructures, properties, and processing. While scanning electron microscopy (SEM) image is able to yield a considerable wealth of information interpretable by the intuition of experts, there has been significant amount of interests in using machine learning, convolutional neural networks (CNNs) in particular, for such image analysis task. Training CNNs for an image analysis task requires a large annotated dataset. However, in many materials science applications, obtaining a large annotated dataset is cost and labor intensive. In this work, we study the use of synthetic data to enlarge the available annotated experimental data of uranium oxide. We utilize a modified Potts model to simulate uranium oxide particles with morphologies similar to those observed experimentally. We then leverage an image-to-image translation model to synthesize the simulated particles as if they are acquired with SEM. Through this process, we obtain pairs of particle images and their corresponding SEM representations, which corresponds to pairs of annotations and images. Unlike previous works, we leverage synthetic data for pretraining a CNN model prior, and finetune that model further with experimental data. We experimentally demonstrate that using synthetic data as an incremental learning process benefits the overall performance compared to training a model on combined synthetic and experimental data.
期刊介绍:
The goal of Computational Materials Science is to report on results that provide new or unique insights into, or significantly expand our understanding of, the properties of materials or phenomena associated with their design, synthesis, processing, characterization, and utilization. To be relevant to the journal, the results should be applied or applicable to specific material systems that are discussed within the submission.