不确定:不确定数据的一阶类型

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems Pub Date : 2014-02-24 DOI:10.1145/2541940.2541958

James Bornholt, Todd Mytkowicz, K. McKinley

{"title":"不确定:不确定数据的一阶类型","authors":"James Bornholt, Todd Mytkowicz, K. McKinley","doi":"10.1145/2541940.2541958","DOIUrl":null,"url":null,"abstract":"Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages developers to pretend it is not probabilistic, which causes three types of uncertainty bugs. (1) Using estimates as facts ignores random error in estimates. (2) Computation compounds that error. (3) Boolean questions on probabilistic data induce false positives and negatives. This paper introduces Uncertain, a new programming language abstraction for uncertain data. We implement a Bayesian network semantics for computation and conditionals that improves program correctness. The runtime uses sampling and hypothesis tests to evaluate computation and conditionals lazily and efficiently. We illustrate with sensor and machine learning applications that Uncertain improves expressiveness and accuracy. Whereas previous probabilistic programming languages focus on experts, Uncertain serves a wide range of developers. Experts still identify error distributions. However, both experts and application writers compute with distributions, improve estimates with domain knowledge, and ask questions with conditionals. The Uncertain type system and operators encourage developers to expose and reason about uncertainty explicitly, controlling false positives and false negatives. These benefits make Uncertain a compelling programming model for modern applications facing the challenge of uncertainty.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"157 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"121","resultStr":"{\"title\":\"Uncertain: a first-order type for uncertain data\",\"authors\":\"James Bornholt, Todd Mytkowicz, K. McKinley\",\"doi\":\"10.1145/2541940.2541958\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages developers to pretend it is not probabilistic, which causes three types of uncertainty bugs. (1) Using estimates as facts ignores random error in estimates. (2) Computation compounds that error. (3) Boolean questions on probabilistic data induce false positives and negatives. This paper introduces Uncertain, a new programming language abstraction for uncertain data. We implement a Bayesian network semantics for computation and conditionals that improves program correctness. The runtime uses sampling and hypothesis tests to evaluate computation and conditionals lazily and efficiently. We illustrate with sensor and machine learning applications that Uncertain improves expressiveness and accuracy. Whereas previous probabilistic programming languages focus on experts, Uncertain serves a wide range of developers. Experts still identify error distributions. However, both experts and application writers compute with distributions, improve estimates with domain knowledge, and ask questions with conditionals. The Uncertain type system and operators encourage developers to expose and reason about uncertainty explicitly, controlling false positives and false negatives. These benefits make Uncertain a compelling programming model for modern applications facing the challenge of uncertainty.\",\"PeriodicalId\":128805,\"journal\":{\"name\":\"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems\",\"volume\":\"157 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-02-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"121\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2541940.2541958\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2541940.2541958","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 121

摘要

新兴应用越来越多地使用传感器数据(GPS)、概率模型、机器学习、大数据和人类数据等估计。不幸的是，用离散类型(浮点数、整数和布尔值)表示这种不确定数据会鼓励开发人员假装它不是概率性的，这会导致三种类型的不确定性错误。(1)将估计作为事实忽略了估计中的随机误差。计算使错误复杂化。(3)概率数据上的布尔问题导致假阳性和假阴性。本文介绍了一种新的不确定数据抽象语言——不确定语言。我们为计算和条件实现了贝叶斯网络语义，提高了程序的正确性。运行时使用抽样和假设检验来惰性而高效地评估计算和条件。我们用传感器和机器学习应用来说明，uncertainty可以提高表达能力和准确性。以前的概率编程语言主要针对专家，而不确定语言服务于广泛的开发人员。专家们仍然在识别误差分布。然而，专家和应用程序编写者都使用分布进行计算，使用领域知识改进估计，并使用条件提出问题。不确定类型系统和操作符鼓励开发人员明确地暴露和推理不确定性，控制误报和误报。这些优点使不确定成为现代应用程序面临不确定性挑战的一个引人注目的编程模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Uncertain: a first-order type for uncertain data

Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages developers to pretend it is not probabilistic, which causes three types of uncertainty bugs. (1) Using estimates as facts ignores random error in estimates. (2) Computation compounds that error. (3) Boolean questions on probabilistic data induce false positives and negatives. This paper introduces Uncertain, a new programming language abstraction for uncertain data. We implement a Bayesian network semantics for computation and conditionals that improves program correctness. The runtime uses sampling and hypothesis tests to evaluate computation and conditionals lazily and efficiently. We illustrate with sensor and machine learning applications that Uncertain improves expressiveness and accuracy. Whereas previous probabilistic programming languages focus on experts, Uncertain serves a wide range of developers. Experts still identify error distributions. However, both experts and application writers compute with distributions, improve estimates with domain knowledge, and ask questions with conditionals. The Uncertain type system and operators encourage developers to expose and reason about uncertainty explicitly, controlling false positives and false negatives. These benefits make Uncertain a compelling programming model for modern applications facing the challenge of uncertainty.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

自引率

0.00%

发文量