On Uncertain Probabilistic Data Modeling

International journal of database theory and application Pub Date : 2016-12-31 DOI:10.14257/ijdta.2016.9.12.17

Teng Lv, Ping Yan, Weimin He

{"title":"On Uncertain Probabilistic Data Modeling","authors":"Teng Lv, Ping Yan, Weimin He","doi":"10.14257/ijdta.2016.9.12.17","DOIUrl":null,"url":null,"abstract":"Uncertainty in data is caused by various reasons including data itself, data mapping, and data policy. For data itself, data are uncertain because of various reasons. For example, data from a sensor network, Internet of Things or Radio Frequency Identification is often inaccurate and uncertain because of devices or environmental factors. For data mapping, integrated data from various heterogonous data sources is commonly uncertain because of uncertain data mapping, data inconsistency, missing data, and dirty data. For data policy, data is modified or hided for policies of data privacy and data confidentiality in an organization. But traditional deterministic data management mainly deals with deterministic data which is precise and certain, and cannot process uncertain data. Modeling uncertain data is a foundation of other technologies for further processing data, such as indexing, querying, searching, mapping, integrating, and mining data, etc. Probabilistic data models of relational databases, XML data and graph data are widely used in many applications and areas today, such as World Wide Web, semantic web, sensor networks, Internet of Things, mobile ad-hoc networks, social networks, traffic networks, biological networks, genome databases, and medical records, etc. This paper presents a survey study of different probabilistic models of uncertain data in relational databases, XML data, and graph data, respectively. The advantages and disadvantages of each kind of probabilistic modes are analyzed and compared. Further open topics of modeling uncertain probabilistic data such as semantic and computation aspects are discussed in the paper. Criteria for modeling uncertain data, such as expressive power, complexity, efficiency, extension are also proposed in the paper.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"1 1","pages":"185-194"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of database theory and application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14257/ijdta.2016.9.12.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Uncertainty in data is caused by various reasons including data itself, data mapping, and data policy. For data itself, data are uncertain because of various reasons. For example, data from a sensor network, Internet of Things or Radio Frequency Identification is often inaccurate and uncertain because of devices or environmental factors. For data mapping, integrated data from various heterogonous data sources is commonly uncertain because of uncertain data mapping, data inconsistency, missing data, and dirty data. For data policy, data is modified or hided for policies of data privacy and data confidentiality in an organization. But traditional deterministic data management mainly deals with deterministic data which is precise and certain, and cannot process uncertain data. Modeling uncertain data is a foundation of other technologies for further processing data, such as indexing, querying, searching, mapping, integrating, and mining data, etc. Probabilistic data models of relational databases, XML data and graph data are widely used in many applications and areas today, such as World Wide Web, semantic web, sensor networks, Internet of Things, mobile ad-hoc networks, social networks, traffic networks, biological networks, genome databases, and medical records, etc. This paper presents a survey study of different probabilistic models of uncertain data in relational databases, XML data, and graph data, respectively. The advantages and disadvantages of each kind of probabilistic modes are analyzed and compared. Further open topics of modeling uncertain probabilistic data such as semantic and computation aspects are discussed in the paper. Criteria for modeling uncertain data, such as expressive power, complexity, efficiency, extension are also proposed in the paper.

查看原文本刊更多论文

不确定概率数据建模

数据的不确定性是由数据本身、数据映射、数据策略等多种原因造成的。对于数据本身来说，由于各种原因，数据是不确定的。例如，由于设备或环境因素，来自传感器网络、物联网或射频识别的数据通常是不准确和不确定的。对于数据映射，由于数据映射不确定、数据不一致、数据缺失和脏数据，来自各种异构数据源的集成数据通常是不确定的。对于数据策略，是指根据组织中的数据隐私和数据机密性策略对数据进行修改或隐藏。但传统的确定性数据管理主要处理精确、确定的确定性数据，无法处理不确定性数据。不确定数据建模是进一步处理数据的其他技术的基础，如索引、查询、搜索、映射、集成和挖掘数据等。目前，关系数据库、XML数据和图形数据的概率数据模型被广泛应用于万维网、语义网、传感器网络、物联网、移动自组网、社交网络、交通网络、生物网络、基因组数据库、医疗记录等众多应用和领域。本文分别对关系数据库、XML数据和图形数据中不确定数据的不同概率模型进行了综述研究。分析比较了各种概率模式的优缺点。本文进一步讨论了不确定概率数据建模中语义和计算方面的开放性问题。本文还提出了不确定数据建模的表达能力、复杂性、效率、可拓性等标准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International journal of database theory and application

自引率

0.00%

发文量