面向大数据处理的信息空间:顺序信息积累过程的统一与并行化

P. Golubtsov
{"title":"面向大数据处理的信息空间:顺序信息积累过程的统一与并行化","authors":"P. Golubtsov","doi":"10.1109/CBI.2019.00031","DOIUrl":null,"url":null,"abstract":"In large-scale research, data are usually collected on many sites, have a huge volume, and new data are constantly generated. Since it is often impossible to collect all the relevant data on a single computer, much attention is paid to the algorithms that provide sequential or parallel accumulation of information and do not need to store all the original data. As an example of information accumulation, the Bayesian updating procedure for linear experiments is analyzed. The corresponding information spaces are defined and the relations between them are studied. It is shown that processing can be unified and simplified by introducing a special canonical form of information representation and transforming all the data and the original prior information into this form. Thanks to the rich algebraic properties of the canonical information space, the sequential Bayesian procedure allows various parallelization options that are ideally suited for distributed data processing platforms, such as Hadoop MapReduce. This opens up the possibility of a flexible and efficient scaling of information accumulation in distributed data processing systems.","PeriodicalId":193238,"journal":{"name":"2019 IEEE 21st Conference on Business Informatics (CBI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Information Spaces for Big Data Processing: Unification and Parallelization of Sequential Information Accumulation Procedures\",\"authors\":\"P. Golubtsov\",\"doi\":\"10.1109/CBI.2019.00031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In large-scale research, data are usually collected on many sites, have a huge volume, and new data are constantly generated. Since it is often impossible to collect all the relevant data on a single computer, much attention is paid to the algorithms that provide sequential or parallel accumulation of information and do not need to store all the original data. As an example of information accumulation, the Bayesian updating procedure for linear experiments is analyzed. The corresponding information spaces are defined and the relations between them are studied. It is shown that processing can be unified and simplified by introducing a special canonical form of information representation and transforming all the data and the original prior information into this form. Thanks to the rich algebraic properties of the canonical information space, the sequential Bayesian procedure allows various parallelization options that are ideally suited for distributed data processing platforms, such as Hadoop MapReduce. This opens up the possibility of a flexible and efficient scaling of information accumulation in distributed data processing systems.\",\"PeriodicalId\":193238,\"journal\":{\"name\":\"2019 IEEE 21st Conference on Business Informatics (CBI)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 21st Conference on Business Informatics (CBI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CBI.2019.00031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 21st Conference on Business Informatics (CBI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CBI.2019.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在大规模研究中,数据通常在多个站点收集,数据量巨大,并且不断产生新的数据。由于在一台计算机上收集所有相关数据通常是不可能的,因此人们非常关注那些提供顺序或并行信息积累且不需要存储所有原始数据的算法。作为信息积累的一个例子,分析了线性实验的贝叶斯更新过程。定义了相应的信息空间,研究了它们之间的关系。结果表明,通过引入一种特殊的规范的信息表示形式,并将所有数据和原始先验信息转换为这种形式,可以统一和简化处理。由于规范信息空间的丰富代数属性,顺序贝叶斯过程允许各种并行化选项,这些选项非常适合分布式数据处理平台,例如Hadoop MapReduce。这为在分布式数据处理系统中灵活有效地扩展信息积累提供了可能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Information Spaces for Big Data Processing: Unification and Parallelization of Sequential Information Accumulation Procedures
In large-scale research, data are usually collected on many sites, have a huge volume, and new data are constantly generated. Since it is often impossible to collect all the relevant data on a single computer, much attention is paid to the algorithms that provide sequential or parallel accumulation of information and do not need to store all the original data. As an example of information accumulation, the Bayesian updating procedure for linear experiments is analyzed. The corresponding information spaces are defined and the relations between them are studied. It is shown that processing can be unified and simplified by introducing a special canonical form of information representation and transforming all the data and the original prior information into this form. Thanks to the rich algebraic properties of the canonical information space, the sequential Bayesian procedure allows various parallelization options that are ideally suited for distributed data processing platforms, such as Hadoop MapReduce. This opens up the possibility of a flexible and efficient scaling of information accumulation in distributed data processing systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信