Akash Kothare, Shridhara Chaube, Yash Moharir, Gaurav Bajodia, S. Dongre
{"title":"SynGen: Synthetic Data Generation","authors":"Akash Kothare, Shridhara Chaube, Yash Moharir, Gaurav Bajodia, S. Dongre","doi":"10.1109/iccica52458.2021.9697232","DOIUrl":null,"url":null,"abstract":"Synthetic data is superficial data generated using various machine learning techniques. The respective synthetic data generated can be used to preserve privacy, test systems, or create training data for machine learning algorithms. Synthetic data generation is critical as the need for specific data is huge in today's world, for example, synthetic data can be used to practice various data science tasks and techniques, while maintaining the anonymity of the samples generated. We used an open-source engine named Faker (v5.6.1) and Gaussian copula to create a platform that can generate datasets, based on user requirements as well as available resources. The user can also perform a variety of machine learning algorithms and differentiate their performance either over the generated dataset or a predefined dataset.","PeriodicalId":327193,"journal":{"name":"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)","volume":"39 14","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Intelligence and Computing Applications (ICCICA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iccica52458.2021.9697232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Synthetic data is superficial data generated using various machine learning techniques. The respective synthetic data generated can be used to preserve privacy, test systems, or create training data for machine learning algorithms. Synthetic data generation is critical as the need for specific data is huge in today's world, for example, synthetic data can be used to practice various data science tasks and techniques, while maintaining the anonymity of the samples generated. We used an open-source engine named Faker (v5.6.1) and Gaussian copula to create a platform that can generate datasets, based on user requirements as well as available resources. The user can also perform a variety of machine learning algorithms and differentiate their performance either over the generated dataset or a predefined dataset.