Alan R. Chappell, Sutanay Choudhury, J. Feo, D. Haglin, Alessandro Morari, Sumit Purohit, K. Schuchardt, Antonino Tumeo, Jesse Weaver, Oreste Villa
{"title":"Toward a data scalable solution for facilitating discovery of scientific data resources","authors":"Alan R. Chappell, Sutanay Choudhury, J. Feo, D. Haglin, Alessandro Morari, Sumit Purohit, K. Schuchardt, Antonino Tumeo, Jesse Weaver, Oreste Villa","doi":"10.1145/2534645.2534655","DOIUrl":null,"url":null,"abstract":"Science is increasingly motivated by the need to process larger quantities of data. It is facing severe challenges in data collection, management, and processing, so much so that the computational demands of \"data scaling\" are competing with, and in many fields surpassing, the traditional objective of decreasing processing time. Example domains with large datasets include astronomy, biology, genomics, climate/weather, and material sciences. This paper presents a real-world use case in which we wish to answer queries provided by domain scientists in order to facilitate discovery of relevant science resources. The problem is that the metadata for these science resources is very large and is growing quickly, rapidly increasing the need for a data scaling solution. We propose a system -- SGEM -- designed for answering graph-based queries over large datasets on cluster architectures, and we report early results for our current capability.","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Design and Implementation of Symbolic Computation Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2534645.2534655","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Science is increasingly motivated by the need to process larger quantities of data. It is facing severe challenges in data collection, management, and processing, so much so that the computational demands of "data scaling" are competing with, and in many fields surpassing, the traditional objective of decreasing processing time. Example domains with large datasets include astronomy, biology, genomics, climate/weather, and material sciences. This paper presents a real-world use case in which we wish to answer queries provided by domain scientists in order to facilitate discovery of relevant science resources. The problem is that the metadata for these science resources is very large and is growing quickly, rapidly increasing the need for a data scaling solution. We propose a system -- SGEM -- designed for answering graph-based queries over large datasets on cluster architectures, and we report early results for our current capability.