基于自组织映射的增强蒙德里安匿名化模型

2020 15th Asia Joint Conference on Information Security (AsiaJCIS) Pub Date : 2020-08-01 DOI:10.1109/AsiaJCIS50894.2020.00026

Peter Shaojui Wang, Pin-Yen Huang, Yu-An Tsai, R. Tso

{"title":"基于自组织映射的增强蒙德里安匿名化模型","authors":"Peter Shaojui Wang, Pin-Yen Huang, Yu-An Tsai, R. Tso","doi":"10.1109/AsiaJCIS50894.2020.00026","DOIUrl":null,"url":null,"abstract":"In the era of big data, privacy preservation has been the focus for data mining. Mondrian anonymization is a state-of-the-art data anonymization algorithm for relational dataset, widely used in many classical syntactic privacy-preserving data mining methods, like k-anonymity, l-diversity, t-closeness, etc. Mondrian anonymization is named for its multidimensional data partitioning in geometric space to find the best partitions for data anonymization. However, one problem with using Mondrian anonymization is taking too much time and memory for the high-dimensional data. Another problem is that the Mondrian-based privacy preservation may lead to the unstable performance of data mining models. For example, in Mondrian-based k-anonymity, the accuracy results of data mining may drop dramatically with the growth of k value. For solving these problems, in this paper we propose an enhanced Mondrian anonymization model based on Self-Organizing Map (SOM-Mondrian). With the help of SOM, multidimensional data are converted from a high dimensional space into two-dimensional space; at the same time, preserving their topological properties of the input space. The resulting two-dimensional data are then used by Mondrian algorithm to find the best partitions for data anonymization. To our best knowledge, we are the first to propose SOM-based method for Mondrian anonymization. Experimental results show that, after applying our proposed method, the processing time of Mondrian anonymization decreases significantly from 12.11 seconds to 0.16 seconds; besides, the accuracy of data mining applications increases, about 2% higher than the results under the standard Mondrian anonymization, and also shows steadier and more robust (the degree of variation is reduced by 75%) to the varying k value.","PeriodicalId":247481,"journal":{"name":"2020 15th Asia Joint Conference on Information Security (AsiaJCIS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An Enhanced Mondrian Anonymization Model based on Self-Organizing Map\",\"authors\":\"Peter Shaojui Wang, Pin-Yen Huang, Yu-An Tsai, R. Tso\",\"doi\":\"10.1109/AsiaJCIS50894.2020.00026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the era of big data, privacy preservation has been the focus for data mining. Mondrian anonymization is a state-of-the-art data anonymization algorithm for relational dataset, widely used in many classical syntactic privacy-preserving data mining methods, like k-anonymity, l-diversity, t-closeness, etc. Mondrian anonymization is named for its multidimensional data partitioning in geometric space to find the best partitions for data anonymization. However, one problem with using Mondrian anonymization is taking too much time and memory for the high-dimensional data. Another problem is that the Mondrian-based privacy preservation may lead to the unstable performance of data mining models. For example, in Mondrian-based k-anonymity, the accuracy results of data mining may drop dramatically with the growth of k value. For solving these problems, in this paper we propose an enhanced Mondrian anonymization model based on Self-Organizing Map (SOM-Mondrian). With the help of SOM, multidimensional data are converted from a high dimensional space into two-dimensional space; at the same time, preserving their topological properties of the input space. The resulting two-dimensional data are then used by Mondrian algorithm to find the best partitions for data anonymization. To our best knowledge, we are the first to propose SOM-based method for Mondrian anonymization. Experimental results show that, after applying our proposed method, the processing time of Mondrian anonymization decreases significantly from 12.11 seconds to 0.16 seconds; besides, the accuracy of data mining applications increases, about 2% higher than the results under the standard Mondrian anonymization, and also shows steadier and more robust (the degree of variation is reduced by 75%) to the varying k value.\",\"PeriodicalId\":247481,\"journal\":{\"name\":\"2020 15th Asia Joint Conference on Information Security (AsiaJCIS)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 15th Asia Joint Conference on Information Security (AsiaJCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AsiaJCIS50894.2020.00026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 15th Asia Joint Conference on Information Security (AsiaJCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AsiaJCIS50894.2020.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在大数据时代，隐私保护一直是数据挖掘的重点。蒙德里安匿名化是一种最先进的关系数据匿名化算法，广泛应用于k-匿名、l-多样性、t-接近等经典的句法隐私保护数据挖掘方法中。蒙德里安匿名化以在几何空间中对数据进行多维分区来寻找数据匿名化的最佳分区而得名。然而，使用蒙德里安匿名化的一个问题是花费太多的时间和内存来处理高维数据。另一个问题是，基于蒙德里安的隐私保护可能导致数据挖掘模型的性能不稳定。例如，在基于蒙德里安的k-匿名中，随着k值的增加，数据挖掘的准确性结果可能会急剧下降。为了解决这些问题，本文提出了一种基于自组织映射(SOM-Mondrian)的增强Mondrian匿名化模型。借助SOM将多维数据从高维空间转换为二维空间;同时，保持其输入空间的拓扑性质。得到的二维数据然后被蒙德里安算法用来寻找数据匿名化的最佳分区。据我们所知，我们是第一个提出基于som的蒙德里安匿名化方法的人。实验结果表明，采用本文方法后，蒙德里安匿名化处理时间由12.11秒显著缩短至0.16秒;此外，数据挖掘应用程序的准确性提高，比标准蒙德里安匿名化下的结果提高了约2%，并且对k值的变化也表现出更稳定和更稳健(变化程度降低了75%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Enhanced Mondrian Anonymization Model based on Self-Organizing Map

In the era of big data, privacy preservation has been the focus for data mining. Mondrian anonymization is a state-of-the-art data anonymization algorithm for relational dataset, widely used in many classical syntactic privacy-preserving data mining methods, like k-anonymity, l-diversity, t-closeness, etc. Mondrian anonymization is named for its multidimensional data partitioning in geometric space to find the best partitions for data anonymization. However, one problem with using Mondrian anonymization is taking too much time and memory for the high-dimensional data. Another problem is that the Mondrian-based privacy preservation may lead to the unstable performance of data mining models. For example, in Mondrian-based k-anonymity, the accuracy results of data mining may drop dramatically with the growth of k value. For solving these problems, in this paper we propose an enhanced Mondrian anonymization model based on Self-Organizing Map (SOM-Mondrian). With the help of SOM, multidimensional data are converted from a high dimensional space into two-dimensional space; at the same time, preserving their topological properties of the input space. The resulting two-dimensional data are then used by Mondrian algorithm to find the best partitions for data anonymization. To our best knowledge, we are the first to propose SOM-based method for Mondrian anonymization. Experimental results show that, after applying our proposed method, the processing time of Mondrian anonymization decreases significantly from 12.11 seconds to 0.16 seconds; besides, the accuracy of data mining applications increases, about 2% higher than the results under the standard Mondrian anonymization, and also shows steadier and more robust (the degree of variation is reduced by 75%) to the varying k value.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 15th Asia Joint Conference on Information Security (AsiaJCIS)

自引率

0.00%

发文量