{"title":"Infinite Probabilistic Databases","authors":"Martin Grohe, P. Lindner","doi":"10.46298/lmcs-18(1:34)2022","DOIUrl":null,"url":null,"abstract":"Probabilistic databases (PDBs) model uncertainty in data in a quantitative\nway. In the established formal framework, probabilistic (relational) databases\nare finite probability spaces over relational database instances. This\nfiniteness can clash with intuitive query behavior (Ceylan et al., KR 2016),\nand with application scenarios that are better modeled by continuous\nprobability distributions (Dalvi et al., CACM 2009).\n We formally introduced infinite PDBs in (Grohe and Lindner, PODS 2019) with a\nprimary focus on countably infinite spaces. However, an extension beyond\ncountable probability spaces raises nontrivial foundational issues concerned\nwith the measurability of events and queries and ultimately with the question\nwhether queries have a well-defined semantics.\n We argue that finite point processes are an appropriate model from\nprobability theory for dealing with general probabilistic databases. This\nallows us to construct suitable (uncountable) probability spaces of database\ninstances in a systematic way. Our main technical results are measurability\nstatements for relational algebra queries as well as aggregate queries and\nDatalog queries.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"211 1","pages":"16:1-16:20"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46298/lmcs-18(1:34)2022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Probabilistic databases (PDBs) model uncertainty in data in a quantitative
way. In the established formal framework, probabilistic (relational) databases
are finite probability spaces over relational database instances. This
finiteness can clash with intuitive query behavior (Ceylan et al., KR 2016),
and with application scenarios that are better modeled by continuous
probability distributions (Dalvi et al., CACM 2009).
We formally introduced infinite PDBs in (Grohe and Lindner, PODS 2019) with a
primary focus on countably infinite spaces. However, an extension beyond
countable probability spaces raises nontrivial foundational issues concerned
with the measurability of events and queries and ultimately with the question
whether queries have a well-defined semantics.
We argue that finite point processes are an appropriate model from
probability theory for dealing with general probabilistic databases. This
allows us to construct suitable (uncountable) probability spaces of database
instances in a systematic way. Our main technical results are measurability
statements for relational algebra queries as well as aggregate queries and
Datalog queries.
概率数据库(PDBs)以定量的方式对数据中的不确定性进行建模。在已建立的正式框架中,概率(关系)数据库是关系数据库实例上的有限概率空间。这种有限性可能与直观的查询行为(Ceylan et al., KR 2016)以及通过连续概率分布更好地建模的应用场景(Dalvi et al., ccm 2009)相冲突。我们在(Grohe and Lindner, PODS 2019)中正式引入了无限pdb,主要关注可数无限空间。然而,超越可数概率空间的扩展引发了与事件和查询的可度量性有关的重要基础问题,并最终引发了查询是否具有良好定义的语义的问题。本文认为,有限点过程是概率论中处理一般概率数据库的合适模型。这允许我们以系统的方式构建数据库实例的合适(不可数)概率空间。我们的主要技术成果是关系代数查询以及聚合查询和数据查询的可度量语句。