面向使用和重用驱动的大数据管理

Zhiwu Xie, Yinlin Chen, J. Speer, T. Walters, P. Tarazaga, M. Kasarda
{"title":"面向使用和重用驱动的大数据管理","authors":"Zhiwu Xie, Yinlin Chen, J. Speer, T. Walters, P. Tarazaga, M. Kasarda","doi":"10.1145/2756406.2756924","DOIUrl":null,"url":null,"abstract":"We propose a use and reuse driven big data management approach that fuses the data repository and data processing capabilities in a co-located, public cloud. It answers to the urgent data management needs from the growing number of researchers who don't fit in the big science/small science dichotomy. This approach will allow researchers to more easily use, manage, and collaborate around big data sets, as well as give librarians the opportunity to work alongside the researchers to preserve and curate data while it is still fresh and being actively used. This also provides the technological foundation to foster a sharing culture more aligned with the open source software development paradigm than the lone-wolf, gift-exchanging small science sharing or the top-down, highly structured big science sharing. To materialize this vision, we provide a system architecture consisting of a scalable digital repository system coupled with the co-located cloud storage and cloud computing, as well as a job scheduler and a deployment management system. Motivated by Virginia Tech's Goodwin Hall instrumentation project, we implemented and evaluated a prototype. The results show not only sufficient capacities for this particular case, but also near perfect linear storage and data processing scalabilities under moderately high workload.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"116 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Towards Use And Reuse Driven Big Data Management\",\"authors\":\"Zhiwu Xie, Yinlin Chen, J. Speer, T. Walters, P. Tarazaga, M. Kasarda\",\"doi\":\"10.1145/2756406.2756924\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a use and reuse driven big data management approach that fuses the data repository and data processing capabilities in a co-located, public cloud. It answers to the urgent data management needs from the growing number of researchers who don't fit in the big science/small science dichotomy. This approach will allow researchers to more easily use, manage, and collaborate around big data sets, as well as give librarians the opportunity to work alongside the researchers to preserve and curate data while it is still fresh and being actively used. This also provides the technological foundation to foster a sharing culture more aligned with the open source software development paradigm than the lone-wolf, gift-exchanging small science sharing or the top-down, highly structured big science sharing. To materialize this vision, we provide a system architecture consisting of a scalable digital repository system coupled with the co-located cloud storage and cloud computing, as well as a job scheduler and a deployment management system. Motivated by Virginia Tech's Goodwin Hall instrumentation project, we implemented and evaluated a prototype. The results show not only sufficient capacities for this particular case, but also near perfect linear storage and data processing scalabilities under moderately high workload.\",\"PeriodicalId\":256118,\"journal\":{\"name\":\"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries\",\"volume\":\"116 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2756406.2756924\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2756406.2756924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

我们提出了一种使用和重用驱动的大数据管理方法,该方法将数据存储库和数据处理能力融合在一个位于同一位置的公共云中。它满足了越来越多不适合大科学/小科学二分法的研究人员的紧急数据管理需求。这种方法将使研究人员能够更轻松地使用、管理和协作大数据集,并使图书馆员有机会与研究人员一起工作,在数据仍然新鲜且被积极使用时保存和管理数据。这也提供了技术基础,以培养更符合开源软件开发范式的共享文化,而不是孤狼式的、交换礼物的小型科学共享或自上而下的、高度结构化的大型科学共享。为了实现这一愿景,我们提供了一个系统架构,包括一个可扩展的数字存储库系统,加上位于同一位置的云存储和云计算,以及一个作业调度程序和一个部署管理系统。在弗吉尼亚理工大学古德温霍尔仪器项目的激励下,我们实现并评估了一个原型。结果表明,对于这种特殊情况,不仅具有足够的容量,而且在中等高工作负载下具有接近完美的线性存储和数据处理可伸缩性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards Use And Reuse Driven Big Data Management
We propose a use and reuse driven big data management approach that fuses the data repository and data processing capabilities in a co-located, public cloud. It answers to the urgent data management needs from the growing number of researchers who don't fit in the big science/small science dichotomy. This approach will allow researchers to more easily use, manage, and collaborate around big data sets, as well as give librarians the opportunity to work alongside the researchers to preserve and curate data while it is still fresh and being actively used. This also provides the technological foundation to foster a sharing culture more aligned with the open source software development paradigm than the lone-wolf, gift-exchanging small science sharing or the top-down, highly structured big science sharing. To materialize this vision, we provide a system architecture consisting of a scalable digital repository system coupled with the co-located cloud storage and cloud computing, as well as a job scheduler and a deployment management system. Motivated by Virginia Tech's Goodwin Hall instrumentation project, we implemented and evaluated a prototype. The results show not only sufficient capacities for this particular case, but also near perfect linear storage and data processing scalabilities under moderately high workload.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信