Schema-independent scientific data cataloging framework

Supun Nakandala, S. Withana, D. Kumarasiri, H. Jayawardena, H. D. Dilum Bandara, S. Perera, S. Marru, Sudhakar Pamidighantam
{"title":"Schema-independent scientific data cataloging framework","authors":"Supun Nakandala, S. Withana, D. Kumarasiri, H. Jayawardena, H. D. Dilum Bandara, S. Perera, S. Marru, Sudhakar Pamidighantam","doi":"10.1109/MERCON.2015.7112361","DOIUrl":null,"url":null,"abstract":"Modern scientific experiments generate vast volumes of data which are hard to keep track of. Consequently, scientists find it difficult to reuse and share these data sets. We address this problem by developing a schema-independent data cataloging framework for efficient management of scientific data. The proposed solution consists of an agent which automatically identifies new data products and extract metadata from them, as well as a server which indexes the metadata using a NoSQL database and provides a REST API for querying, sharing, and reusing the data sets. The novelty of our solution lies in the pluggable metadata extraction logic, extensible data product generation monitors, use of a NoSQL database, and the ability to dynamically add new metadata fields. The use of Apache Solr as the backend database enables the proposed solution to index and search data products much faster than a solution based on relational databases. For example, our Apache Solr based implementation can resolve full text, sub-string, prefix, and suffix queries 91 %-99 % faster than a MySQL-based implementation.","PeriodicalId":373492,"journal":{"name":"2015 Moratuwa Engineering Research Conference (MERCon)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Moratuwa Engineering Research Conference (MERCon)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MERCON.2015.7112361","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Modern scientific experiments generate vast volumes of data which are hard to keep track of. Consequently, scientists find it difficult to reuse and share these data sets. We address this problem by developing a schema-independent data cataloging framework for efficient management of scientific data. The proposed solution consists of an agent which automatically identifies new data products and extract metadata from them, as well as a server which indexes the metadata using a NoSQL database and provides a REST API for querying, sharing, and reusing the data sets. The novelty of our solution lies in the pluggable metadata extraction logic, extensible data product generation monitors, use of a NoSQL database, and the ability to dynamically add new metadata fields. The use of Apache Solr as the backend database enables the proposed solution to index and search data products much faster than a solution based on relational databases. For example, our Apache Solr based implementation can resolve full text, sub-string, prefix, and suffix queries 91 %-99 % faster than a MySQL-based implementation.
独立于模式的科学数据编目框架
现代科学实验产生了大量难以追踪的数据。因此,科学家发现很难重用和共享这些数据集。我们通过开发一个独立于模式的数据编目框架来解决这个问题,从而有效地管理科学数据。提出的解决方案包括一个自动识别新数据产品并从中提取元数据的代理,以及一个使用NoSQL数据库对元数据进行索引并提供用于查询、共享和重用数据集的REST API的服务器。我们的解决方案的新颖之处在于可插拔的元数据提取逻辑、可扩展的数据产品生成监视器、NoSQL数据库的使用以及动态添加新元数据字段的能力。使用Apache Solr作为后端数据库使所提出的解决方案能够比基于关系数据库的解决方案更快地索引和搜索数据产品。例如,我们基于Apache Solr的实现可以解析全文、子字符串、前缀和后缀查询,比基于mysql的实现快91% - 99%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信