{"title":"用于测试数据库应用程序的隐私感知数据生成","authors":"Xintao Wu, Chintan Sanghvi, Yongge Wang, Yuliang Zheng","doi":"10.1109/IDEAS.2005.45","DOIUrl":null,"url":null,"abstract":"Testing of database applications is of great importance. A significant issue in database application testing consists in the availability of representative data. In this paper, we investigate the problem of generating a synthetic database based on a-priori knowledge about a production database. Our approach is to fit general location model using various characteristics (e.g., constraints, statistics, rules) extracted from the production database and then generate the synthetic data using model learnt. The generated data is valid and similar to real data in terms of statistical distribution, hence it can be used for functional and performance testing. As characteristics extracted may contain information which may be used by attacker to derive some confidential information about individuals, we present our disclosure analysis method which applies cell suppression technique for identity disclosure analysis and perturbation for value disclosure.","PeriodicalId":357591,"journal":{"name":"9th International Database Engineering & Application Symposium (IDEAS'05)","volume":"236 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Privacy aware data generation for testing database applications\",\"authors\":\"Xintao Wu, Chintan Sanghvi, Yongge Wang, Yuliang Zheng\",\"doi\":\"10.1109/IDEAS.2005.45\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Testing of database applications is of great importance. A significant issue in database application testing consists in the availability of representative data. In this paper, we investigate the problem of generating a synthetic database based on a-priori knowledge about a production database. Our approach is to fit general location model using various characteristics (e.g., constraints, statistics, rules) extracted from the production database and then generate the synthetic data using model learnt. The generated data is valid and similar to real data in terms of statistical distribution, hence it can be used for functional and performance testing. As characteristics extracted may contain information which may be used by attacker to derive some confidential information about individuals, we present our disclosure analysis method which applies cell suppression technique for identity disclosure analysis and perturbation for value disclosure.\",\"PeriodicalId\":357591,\"journal\":{\"name\":\"9th International Database Engineering & Application Symposium (IDEAS'05)\",\"volume\":\"236 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"9th International Database Engineering & Application Symposium (IDEAS'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IDEAS.2005.45\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"9th International Database Engineering & Application Symposium (IDEAS'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IDEAS.2005.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Privacy aware data generation for testing database applications
Testing of database applications is of great importance. A significant issue in database application testing consists in the availability of representative data. In this paper, we investigate the problem of generating a synthetic database based on a-priori knowledge about a production database. Our approach is to fit general location model using various characteristics (e.g., constraints, statistics, rules) extracted from the production database and then generate the synthetic data using model learnt. The generated data is valid and similar to real data in terms of statistical distribution, hence it can be used for functional and performance testing. As characteristics extracted may contain information which may be used by attacker to derive some confidential information about individuals, we present our disclosure analysis method which applies cell suppression technique for identity disclosure analysis and perturbation for value disclosure.