Sharing confidential data for algorithm development by multiple imputation

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2013-07-29 DOI:10.1145/2484838.2484865

S. Verwer, S. V. D. Braak, Sunil Choenni

引用次数: 2

Abstract

The availability of real-life data sets is of crucial importance for algorithm and application development, as these often require insight into the specific properties of the data. Often, however, such data are not released because of their proprietary and confidential nature. We propose to solve this problem using the statistical technique of multiple imputation, which is used as a powerful method for generating realistic synthetic data sets. Additionally, it is shown how the generated records can be combined into networked data using clustering techniques.

查看原文本刊更多论文

通过多重输入共享机密数据以进行算法开发

真实数据集的可用性对于算法和应用程序开发至关重要，因为这些通常需要深入了解数据的特定属性。然而，由于这些数据的专有性和保密性，这些数据通常不会被公布。我们建议使用多重插值的统计技术来解决这个问题，这是一种生成真实合成数据集的有力方法。此外，还展示了如何使用聚类技术将生成的记录组合成网络数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量