{"title":"技术视角:合成数据需要可重复性基准","authors":"Xi He","doi":"10.1145/3665252.3665266","DOIUrl":null,"url":null,"abstract":"Synthetic data is a vital substitute for real sensitive personal data in supporting social science research and policy studies. Extensive prior research has delved into various models for generating synthetic data, from traditional statistical approaches to cutting-edge deep-learning methods. However, selecting the most suitable one for unforeseen applications poses a significant challenge due to the varying strengths and weaknesses, dependent on factors such as the application domain, data distribution, analytical requirements, and privacy considerations.","PeriodicalId":346332,"journal":{"name":"ACM SIGMOD Record","volume":"21 6","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Technical Perspective: Synthetic Data Needs a Reproducibility Benchmark\",\"authors\":\"Xi He\",\"doi\":\"10.1145/3665252.3665266\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Synthetic data is a vital substitute for real sensitive personal data in supporting social science research and policy studies. Extensive prior research has delved into various models for generating synthetic data, from traditional statistical approaches to cutting-edge deep-learning methods. However, selecting the most suitable one for unforeseen applications poses a significant challenge due to the varying strengths and weaknesses, dependent on factors such as the application domain, data distribution, analytical requirements, and privacy considerations.\",\"PeriodicalId\":346332,\"journal\":{\"name\":\"ACM SIGMOD Record\",\"volume\":\"21 6\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM SIGMOD Record\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3665252.3665266\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGMOD Record","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3665252.3665266","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Technical Perspective: Synthetic Data Needs a Reproducibility Benchmark
Synthetic data is a vital substitute for real sensitive personal data in supporting social science research and policy studies. Extensive prior research has delved into various models for generating synthetic data, from traditional statistical approaches to cutting-edge deep-learning methods. However, selecting the most suitable one for unforeseen applications poses a significant challenge due to the varying strengths and weaknesses, dependent on factors such as the application domain, data distribution, analytical requirements, and privacy considerations.