论“大数据”流的集群化

S. Berkovich, Duoduo Liao
{"title":"论“大数据”流的集群化","authors":"S. Berkovich, Duoduo Liao","doi":"10.1145/2345316.2345320","DOIUrl":null,"url":null,"abstract":"Big Data refers to the rising flood of digital data from many different sources, including the sensors, digitizers, scanners, mobile phones, cameras, software-based tools, internet, and so on. \"Big\" and \"diverse\" are two important characteristics of Big Data. The diversity of the Big Data, such as text, geometry, image, video, or sound, also increases difficulties of big data processing.\n Coping with the \"Big Data\" problems requires a radical change in the philosophy of the organization of information processing. Primarily, the Big Data approach has to modify the underlying computational model in order to manage the uncertainty in the access to information items in a huge nebulous environment. As a result, the produced outcomes are directly influenced only by some active part of all information items, while the rest of the available information items just indirectly affect the choice of the active part. An analogous functionality exhibits the organization of the brain featuring the unconsciousness, and a characteristic similarity shows the retrieval process in Google.\n In this talk, we introduce a novel method for on-the-fly clusterization of amorphous data from diverse sources. The devised construction is based on the previously developed FuzzyFind Dictionary reversing the error-correction scheme of Golay Code. This clusterization involves processing of intensive continuous data streams that can be effectively implemented using multi-core pipelining with forced interrupts. The suggested clusterization is especially suitable for the Big Data computational model as it materializes the requirement of purposeful selection of information items in unsteady framework of cloud computing and stream processing. Furthermore, the uncertainties in relation to the considered method of clusterization are moderated due to the idea of the bounded rationality, an approach that does not require a complete exact knowledge for sensible decision-making.","PeriodicalId":400763,"journal":{"name":"International Conference and Exhibition on Computing for Geospatial Research & Application","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"On clusterization of \\\"big data\\\" streams\",\"authors\":\"S. Berkovich, Duoduo Liao\",\"doi\":\"10.1145/2345316.2345320\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big Data refers to the rising flood of digital data from many different sources, including the sensors, digitizers, scanners, mobile phones, cameras, software-based tools, internet, and so on. \\\"Big\\\" and \\\"diverse\\\" are two important characteristics of Big Data. The diversity of the Big Data, such as text, geometry, image, video, or sound, also increases difficulties of big data processing.\\n Coping with the \\\"Big Data\\\" problems requires a radical change in the philosophy of the organization of information processing. Primarily, the Big Data approach has to modify the underlying computational model in order to manage the uncertainty in the access to information items in a huge nebulous environment. As a result, the produced outcomes are directly influenced only by some active part of all information items, while the rest of the available information items just indirectly affect the choice of the active part. An analogous functionality exhibits the organization of the brain featuring the unconsciousness, and a characteristic similarity shows the retrieval process in Google.\\n In this talk, we introduce a novel method for on-the-fly clusterization of amorphous data from diverse sources. The devised construction is based on the previously developed FuzzyFind Dictionary reversing the error-correction scheme of Golay Code. This clusterization involves processing of intensive continuous data streams that can be effectively implemented using multi-core pipelining with forced interrupts. The suggested clusterization is especially suitable for the Big Data computational model as it materializes the requirement of purposeful selection of information items in unsteady framework of cloud computing and stream processing. Furthermore, the uncertainties in relation to the considered method of clusterization are moderated due to the idea of the bounded rationality, an approach that does not require a complete exact knowledge for sensible decision-making.\",\"PeriodicalId\":400763,\"journal\":{\"name\":\"International Conference and Exhibition on Computing for Geospatial Research & Application\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference and Exhibition on Computing for Geospatial Research & Application\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2345316.2345320\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference and Exhibition on Computing for Geospatial Research & Application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2345316.2345320","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

摘要

大数据指的是来自许多不同来源的不断增长的数字数据,包括传感器、数字化仪、扫描仪、手机、相机、基于软件的工具、互联网等。“大”和“多元”是大数据的两个重要特征。大数据的多样性,如文本、几何、图像、视频、声音等,也增加了大数据处理的难度。应对“大数据”问题需要从根本上改变组织信息处理的理念。首先,大数据方法必须修改底层计算模型,以管理在巨大的模糊环境中获取信息项目的不确定性。因此,所产生的结果只受到所有信息项中某些活动部分的直接影响,而其余可用信息项只是间接影响活动部分的选择。一个类似的功能显示了以无意识为特征的大脑组织,一个特征的相似性显示了谷歌的检索过程。在这次演讲中,我们介绍了一种新的方法来实时聚类来自不同来源的非晶数据。设计的结构是基于先前开发的模糊查找字典,逆转了Golay码的纠错方案。这种集群化涉及处理密集的连续数据流,可以使用带有强制中断的多核流水线有效地实现。建议的聚类特别适合大数据计算模型,实现了云计算和流处理非定常框架下有目的地选择信息项的要求。此外,由于有限理性的思想,与所考虑的聚类方法相关的不确定性得到了缓和,这种方法不需要完全准确的知识来进行明智的决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On clusterization of "big data" streams
Big Data refers to the rising flood of digital data from many different sources, including the sensors, digitizers, scanners, mobile phones, cameras, software-based tools, internet, and so on. "Big" and "diverse" are two important characteristics of Big Data. The diversity of the Big Data, such as text, geometry, image, video, or sound, also increases difficulties of big data processing. Coping with the "Big Data" problems requires a radical change in the philosophy of the organization of information processing. Primarily, the Big Data approach has to modify the underlying computational model in order to manage the uncertainty in the access to information items in a huge nebulous environment. As a result, the produced outcomes are directly influenced only by some active part of all information items, while the rest of the available information items just indirectly affect the choice of the active part. An analogous functionality exhibits the organization of the brain featuring the unconsciousness, and a characteristic similarity shows the retrieval process in Google. In this talk, we introduce a novel method for on-the-fly clusterization of amorphous data from diverse sources. The devised construction is based on the previously developed FuzzyFind Dictionary reversing the error-correction scheme of Golay Code. This clusterization involves processing of intensive continuous data streams that can be effectively implemented using multi-core pipelining with forced interrupts. The suggested clusterization is especially suitable for the Big Data computational model as it materializes the requirement of purposeful selection of information items in unsteady framework of cloud computing and stream processing. Furthermore, the uncertainties in relation to the considered method of clusterization are moderated due to the idea of the bounded rationality, an approach that does not require a complete exact knowledge for sensible decision-making.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信