{"title":"一种生成高质量结构化数据的方法","authors":"Yunfei Jia, Xinhuan Zhang","doi":"10.1109/CISCE58541.2023.10142328","DOIUrl":null,"url":null,"abstract":"Structured data is modeled to represent data distributions and obtain realistic data. Data augmentation is of great significance to data privacy protection, machine learning (ML) applications, etc. However, modeling structured data distributions is challenging due to the presence of both numerical and categorical columns. Additionally, structured data often suffers from class imbalance issues. This paper presents a method for generating structured data using generative adversarial networks (GANs). Discrete variables are transformed into continuous variables using an embedding model, while the variational Bayesian Gaussian mixture model (VBGMM) is employed to model the distribution of numerical variables. To address the issue of class imbalance, a multi-category generator is designed. The proposed method is evaluated using various metrics and is compared with other data generation techniques and traditional oversampling methods. The results demonstrate the effectiveness of the proposed method for structured data generation.","PeriodicalId":145263,"journal":{"name":"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)","volume":"304 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Approach For Generating High Quality Structured Data\",\"authors\":\"Yunfei Jia, Xinhuan Zhang\",\"doi\":\"10.1109/CISCE58541.2023.10142328\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Structured data is modeled to represent data distributions and obtain realistic data. Data augmentation is of great significance to data privacy protection, machine learning (ML) applications, etc. However, modeling structured data distributions is challenging due to the presence of both numerical and categorical columns. Additionally, structured data often suffers from class imbalance issues. This paper presents a method for generating structured data using generative adversarial networks (GANs). Discrete variables are transformed into continuous variables using an embedding model, while the variational Bayesian Gaussian mixture model (VBGMM) is employed to model the distribution of numerical variables. To address the issue of class imbalance, a multi-category generator is designed. The proposed method is evaluated using various metrics and is compared with other data generation techniques and traditional oversampling methods. The results demonstrate the effectiveness of the proposed method for structured data generation.\",\"PeriodicalId\":145263,\"journal\":{\"name\":\"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)\",\"volume\":\"304 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISCE58541.2023.10142328\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 5th International Conference on Communications, Information System and Computer Engineering (CISCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISCE58541.2023.10142328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Approach For Generating High Quality Structured Data
Structured data is modeled to represent data distributions and obtain realistic data. Data augmentation is of great significance to data privacy protection, machine learning (ML) applications, etc. However, modeling structured data distributions is challenging due to the presence of both numerical and categorical columns. Additionally, structured data often suffers from class imbalance issues. This paper presents a method for generating structured data using generative adversarial networks (GANs). Discrete variables are transformed into continuous variables using an embedding model, while the variational Bayesian Gaussian mixture model (VBGMM) is employed to model the distribution of numerical variables. To address the issue of class imbalance, a multi-category generator is designed. The proposed method is evaluated using various metrics and is compared with other data generation techniques and traditional oversampling methods. The results demonstrate the effectiveness of the proposed method for structured data generation.