On Computational Thinking, Inferential Thinking and Data Science

Michael I. Jordan
{"title":"On Computational Thinking, Inferential Thinking and Data Science","authors":"Michael I. Jordan","doi":"10.1145/2935764.2935826","DOIUrl":null,"url":null,"abstract":"The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in \"Big Data\" is apparent from their sharply divergent nature at an elementary level-in computer science, the growth of the number of data points is a source of \"complexity\" that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of \"simplicity\" in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as \"runtime\" in core statistical theory and the lack of a role for statistical concepts such as \"risk\" in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and ways to exploit parallelism so as to trade off the speed and accuracy of inference.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2935764.2935826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply divergent nature at an elementary level-in computer science, the growth of the number of data points is a source of "complexity" that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of "simplicity" in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as "runtime" in core statistical theory and the lack of a role for statistical concepts such as "risk" in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and ways to exploit parallelism so as to trade off the speed and accuracy of inference.
论计算思维、推理思维与数据科学
科学和技术中数据集的规模和范围的快速增长创造了对混合推理和计算科学的数据分析的新颖基础视角的需求。这些领域的经典观点不足以解决“大数据”中出现的问题,这一点从它们在基础层面上的急剧分歧中可以明显看出——在计算机科学中,数据点数量的增长是“复杂性”的来源,必须通过算法或硬件加以驯服,而在统计学中,数据点数量的增长是“简单性”的来源,因为推论通常更强,并且可以调用渐近结果。在正式层面上,计算概念(如“运行时”)在核心统计理论中缺乏作用,统计概念(如“风险”)在核心计算理论中缺乏作用,这使得差距变得明显。我提出了几个旨在桥接计算和统计的研究小片段,包括隐私和通信约束下的推理问题,以及利用并行性以权衡推理的速度和准确性的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信