Toward a data scalable solution for facilitating discovery of scientific data resources

Alan R. Chappell, Sutanay Choudhury, J. Feo, D. Haglin, Alessandro Morari, Sumit Purohit, K. Schuchardt, Antonino Tumeo, Jesse Weaver, Oreste Villa
{"title":"Toward a data scalable solution for facilitating discovery of scientific data resources","authors":"Alan R. Chappell, Sutanay Choudhury, J. Feo, D. Haglin, Alessandro Morari, Sumit Purohit, K. Schuchardt, Antonino Tumeo, Jesse Weaver, Oreste Villa","doi":"10.1145/2534645.2534655","DOIUrl":null,"url":null,"abstract":"Science is increasingly motivated by the need to process larger quantities of data. It is facing severe challenges in data collection, management, and processing, so much so that the computational demands of \"data scaling\" are competing with, and in many fields surpassing, the traditional objective of decreasing processing time. Example domains with large datasets include astronomy, biology, genomics, climate/weather, and material sciences. This paper presents a real-world use case in which we wish to answer queries provided by domain scientists in order to facilitate discovery of relevant science resources. The problem is that the metadata for these science resources is very large and is growing quickly, rapidly increasing the need for a data scaling solution. We propose a system -- SGEM -- designed for answering graph-based queries over large datasets on cluster architectures, and we report early results for our current capability.","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Design and Implementation of Symbolic Computation Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2534645.2534655","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Science is increasingly motivated by the need to process larger quantities of data. It is facing severe challenges in data collection, management, and processing, so much so that the computational demands of "data scaling" are competing with, and in many fields surpassing, the traditional objective of decreasing processing time. Example domains with large datasets include astronomy, biology, genomics, climate/weather, and material sciences. This paper presents a real-world use case in which we wish to answer queries provided by domain scientists in order to facilitate discovery of relevant science resources. The problem is that the metadata for these science resources is very large and is growing quickly, rapidly increasing the need for a data scaling solution. We propose a system -- SGEM -- designed for answering graph-based queries over large datasets on cluster architectures, and we report early results for our current capability.
为促进科学数据资源的发现提供数据可扩展的解决方案
科学越来越多地受到处理大量数据的需要的推动。它在数据收集、管理和处理方面面临着严峻的挑战,以至于“数据扩展”的计算需求正在与减少处理时间的传统目标竞争,并在许多领域超越了这一目标。具有大型数据集的示例领域包括天文学、生物学、基因组学、气候/天气和材料科学。本文提出了一个真实世界的用例,在这个用例中,我们希望回答领域科学家提供的查询,以促进相关科学资源的发现。问题是,这些科学资源的元数据非常大,并且增长迅速,这迅速增加了对数据扩展解决方案的需求。我们提出了一个系统——SGEM——设计用于在集群架构上回答基于图的大型数据集查询,我们报告了我们当前能力的早期结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信