{"title":"Towards Privacy-Preserving Relational Data Synthesis via Probabilistic Relational Models","authors":"Malte Luttermann, Ralf Möller, Mattis Hartwig","doi":"arxiv-2409.04194","DOIUrl":null,"url":null,"abstract":"Probabilistic relational models provide a well-established formalism to\ncombine first-order logic and probabilistic models, thereby allowing to\nrepresent relationships between objects in a relational domain. At the same\ntime, the field of artificial intelligence requires increasingly large amounts\nof relational training data for various machine learning tasks. Collecting\nreal-world data, however, is often challenging due to privacy concerns, data\nprotection regulations, high costs, and so on. To mitigate these challenges,\nthe generation of synthetic data is a promising approach. In this paper, we\nsolve the problem of generating synthetic relational data via probabilistic\nrelational models. In particular, we propose a fully-fledged pipeline to go\nfrom relational database to probabilistic relational model, which can then be\nused to sample new synthetic relational data points from its underlying\nprobability distribution. As part of our proposed pipeline, we introduce a\nlearning algorithm to construct a probabilistic relational model from a given\nrelational database.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"13 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Probabilistic relational models provide a well-established formalism to
combine first-order logic and probabilistic models, thereby allowing to
represent relationships between objects in a relational domain. At the same
time, the field of artificial intelligence requires increasingly large amounts
of relational training data for various machine learning tasks. Collecting
real-world data, however, is often challenging due to privacy concerns, data
protection regulations, high costs, and so on. To mitigate these challenges,
the generation of synthetic data is a promising approach. In this paper, we
solve the problem of generating synthetic relational data via probabilistic
relational models. In particular, we propose a fully-fledged pipeline to go
from relational database to probabilistic relational model, which can then be
used to sample new synthetic relational data points from its underlying
probability distribution. As part of our proposed pipeline, we introduce a
learning algorithm to construct a probabilistic relational model from a given
relational database.