Peter Shaojui Wang, Pin-Yen Huang, Yu-An Tsai, R. Tso
{"title":"基于自组织映射的增强蒙德里安匿名化模型","authors":"Peter Shaojui Wang, Pin-Yen Huang, Yu-An Tsai, R. Tso","doi":"10.1109/AsiaJCIS50894.2020.00026","DOIUrl":null,"url":null,"abstract":"In the era of big data, privacy preservation has been the focus for data mining. Mondrian anonymization is a state-of-the-art data anonymization algorithm for relational dataset, widely used in many classical syntactic privacy-preserving data mining methods, like k-anonymity, l-diversity, t-closeness, etc. Mondrian anonymization is named for its multidimensional data partitioning in geometric space to find the best partitions for data anonymization. However, one problem with using Mondrian anonymization is taking too much time and memory for the high-dimensional data. Another problem is that the Mondrian-based privacy preservation may lead to the unstable performance of data mining models. For example, in Mondrian-based k-anonymity, the accuracy results of data mining may drop dramatically with the growth of k value. For solving these problems, in this paper we propose an enhanced Mondrian anonymization model based on Self-Organizing Map (SOM-Mondrian). With the help of SOM, multidimensional data are converted from a high dimensional space into two-dimensional space; at the same time, preserving their topological properties of the input space. The resulting two-dimensional data are then used by Mondrian algorithm to find the best partitions for data anonymization. To our best knowledge, we are the first to propose SOM-based method for Mondrian anonymization. Experimental results show that, after applying our proposed method, the processing time of Mondrian anonymization decreases significantly from 12.11 seconds to 0.16 seconds; besides, the accuracy of data mining applications increases, about 2% higher than the results under the standard Mondrian anonymization, and also shows steadier and more robust (the degree of variation is reduced by 75%) to the varying k value.","PeriodicalId":247481,"journal":{"name":"2020 15th Asia Joint Conference on Information Security (AsiaJCIS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Enhanced Mondrian Anonymization Model based on Self-Organizing Map\",\"authors\":\"Peter Shaojui Wang, Pin-Yen Huang, Yu-An Tsai, R. Tso\",\"doi\":\"10.1109/AsiaJCIS50894.2020.00026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the era of big data, privacy preservation has been the focus for data mining. Mondrian anonymization is a state-of-the-art data anonymization algorithm for relational dataset, widely used in many classical syntactic privacy-preserving data mining methods, like k-anonymity, l-diversity, t-closeness, etc. Mondrian anonymization is named for its multidimensional data partitioning in geometric space to find the best partitions for data anonymization. However, one problem with using Mondrian anonymization is taking too much time and memory for the high-dimensional data. Another problem is that the Mondrian-based privacy preservation may lead to the unstable performance of data mining models. For example, in Mondrian-based k-anonymity, the accuracy results of data mining may drop dramatically with the growth of k value. For solving these problems, in this paper we propose an enhanced Mondrian anonymization model based on Self-Organizing Map (SOM-Mondrian). With the help of SOM, multidimensional data are converted from a high dimensional space into two-dimensional space; at the same time, preserving their topological properties of the input space. The resulting two-dimensional data are then used by Mondrian algorithm to find the best partitions for data anonymization. To our best knowledge, we are the first to propose SOM-based method for Mondrian anonymization. Experimental results show that, after applying our proposed method, the processing time of Mondrian anonymization decreases significantly from 12.11 seconds to 0.16 seconds; besides, the accuracy of data mining applications increases, about 2% higher than the results under the standard Mondrian anonymization, and also shows steadier and more robust (the degree of variation is reduced by 75%) to the varying k value.\",\"PeriodicalId\":247481,\"journal\":{\"name\":\"2020 15th Asia Joint Conference on Information Security (AsiaJCIS)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 15th Asia Joint Conference on Information Security (AsiaJCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AsiaJCIS50894.2020.00026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 15th Asia Joint Conference on Information Security (AsiaJCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AsiaJCIS50894.2020.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Enhanced Mondrian Anonymization Model based on Self-Organizing Map
In the era of big data, privacy preservation has been the focus for data mining. Mondrian anonymization is a state-of-the-art data anonymization algorithm for relational dataset, widely used in many classical syntactic privacy-preserving data mining methods, like k-anonymity, l-diversity, t-closeness, etc. Mondrian anonymization is named for its multidimensional data partitioning in geometric space to find the best partitions for data anonymization. However, one problem with using Mondrian anonymization is taking too much time and memory for the high-dimensional data. Another problem is that the Mondrian-based privacy preservation may lead to the unstable performance of data mining models. For example, in Mondrian-based k-anonymity, the accuracy results of data mining may drop dramatically with the growth of k value. For solving these problems, in this paper we propose an enhanced Mondrian anonymization model based on Self-Organizing Map (SOM-Mondrian). With the help of SOM, multidimensional data are converted from a high dimensional space into two-dimensional space; at the same time, preserving their topological properties of the input space. The resulting two-dimensional data are then used by Mondrian algorithm to find the best partitions for data anonymization. To our best knowledge, we are the first to propose SOM-based method for Mondrian anonymization. Experimental results show that, after applying our proposed method, the processing time of Mondrian anonymization decreases significantly from 12.11 seconds to 0.16 seconds; besides, the accuracy of data mining applications increases, about 2% higher than the results under the standard Mondrian anonymization, and also shows steadier and more robust (the degree of variation is reduced by 75%) to the varying k value.