{"title":"The Creation of Artificial Data for Training a Neural Network Using the Example of a Conveyor Production Line for Flooring.","authors":"Alexey Zaripov, Roman Kulshin, Anatoly Sidorov","doi":"10.3390/jimaging11050168","DOIUrl":null,"url":null,"abstract":"<p><p>This work is dedicated to the development of a system for generating artificial data for training neural networks used within a conveyor-based technology framework. It presents an overview of the application areas of computer vision (CV) and establishes that traditional methods of data collection and annotation-such as video recording and manual image labeling-are associated with high time and financial costs, which limits their efficiency. In this context, synthetic data represents an alternative capable of significantly reducing the time and financial expenses involved in forming training datasets. Modern methods for generating synthetic images using various tools-from game engines to generative neural networks-are reviewed. As a tool-platform solution, the concept of digital twins for simulating technological processes was considered, within which synthetic data is utilized. Based on the review findings, a generalized model for synthetic data generation was proposed and tested on the example of quality control for floor coverings on a conveyor line. The developed system provided the generation of photorealistic and diverse images suitable for training neural network models. A comparative analysis showed that the YOLOv8 model trained on synthetic data significantly outperformed the model trained on real images: the mAP50 metric reached 0.95 versus 0.36, respectively. This result demonstrates the high adequacy of the model built on the synthetic dataset and highlights the potential of using synthetic data to improve the quality of computer vision models when access to real data is limited.</p>","PeriodicalId":37035,"journal":{"name":"Journal of Imaging","volume":"11 5","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12112862/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/jimaging11050168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
This work is dedicated to the development of a system for generating artificial data for training neural networks used within a conveyor-based technology framework. It presents an overview of the application areas of computer vision (CV) and establishes that traditional methods of data collection and annotation-such as video recording and manual image labeling-are associated with high time and financial costs, which limits their efficiency. In this context, synthetic data represents an alternative capable of significantly reducing the time and financial expenses involved in forming training datasets. Modern methods for generating synthetic images using various tools-from game engines to generative neural networks-are reviewed. As a tool-platform solution, the concept of digital twins for simulating technological processes was considered, within which synthetic data is utilized. Based on the review findings, a generalized model for synthetic data generation was proposed and tested on the example of quality control for floor coverings on a conveyor line. The developed system provided the generation of photorealistic and diverse images suitable for training neural network models. A comparative analysis showed that the YOLOv8 model trained on synthetic data significantly outperformed the model trained on real images: the mAP50 metric reached 0.95 versus 0.36, respectively. This result demonstrates the high adequacy of the model built on the synthetic dataset and highlights the potential of using synthetic data to improve the quality of computer vision models when access to real data is limited.