Framework for Efficient Indexing and Searching of Scientific Metadata

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing Pub Date : 2010-05-17 DOI:10.1109/CCGRID.2010.120

Chaitali Gupta, M. Govindaraju

{"title":"Framework for Efficient Indexing and Searching of Scientific Metadata","authors":"Chaitali Gupta, M. Govindaraju","doi":"10.1109/CCGRID.2010.120","DOIUrl":null,"url":null,"abstract":"A seamless and intuitive data reduction capability for the vast amount of scientific metadata generated by experiments is critical to ensure effective use of the data by domain specific scientists. The portal environments and scientific gateways currently used by scientists provide search capability that is limited to the pre-defined pull-down menus and conditions set in the portal interface. Currently, data reduction can only be effectively achieved by scientists who have developed expertise in dealing with complex and disparate query languages. A common theme in our discussions with scientists is that data reduction capability, similar to web search in terms of ease-of-use, scalability, and freshness/accuracy of results, is a critical need that can greatly enhance the productivity and quality of scientific research. Most existing search tools are designed for exact string matching, but such matches are highly unlikely given the nature of metadata produced by instruments and a user’s inability to recall exact numbers to search in very large datasets. This paper presents research to locate metadata of interest within a range of values. To meet this goal, we leverage the use of XML in metadata description for scientific datasets, specifically the NeXus datasets generated by the SNS scientists. We have designed a scalable indexing structure for processing data reduction queries. Web semantics and ontology based methodologies are also employed to provide an elegant, intuitive, and powerful free-form query based data reduction interface to end users.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2010.120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

A seamless and intuitive data reduction capability for the vast amount of scientific metadata generated by experiments is critical to ensure effective use of the data by domain specific scientists. The portal environments and scientific gateways currently used by scientists provide search capability that is limited to the pre-defined pull-down menus and conditions set in the portal interface. Currently, data reduction can only be effectively achieved by scientists who have developed expertise in dealing with complex and disparate query languages. A common theme in our discussions with scientists is that data reduction capability, similar to web search in terms of ease-of-use, scalability, and freshness/accuracy of results, is a critical need that can greatly enhance the productivity and quality of scientific research. Most existing search tools are designed for exact string matching, but such matches are highly unlikely given the nature of metadata produced by instruments and a user’s inability to recall exact numbers to search in very large datasets. This paper presents research to locate metadata of interest within a range of values. To meet this goal, we leverage the use of XML in metadata description for scientific datasets, specifically the NeXus datasets generated by the SNS scientists. We have designed a scalable indexing structure for processing data reduction queries. Web semantics and ontology based methodologies are also employed to provide an elegant, intuitive, and powerful free-form query based data reduction interface to end users.

查看原文本刊更多论文

科学元数据高效索引与检索框架

为实验产生的大量科学元数据提供无缝和直观的数据缩减能力对于确保特定领域科学家有效使用数据至关重要。科学家目前使用的门户环境和科学网关提供的搜索功能仅限于预定义的下拉菜单和门户界面中设置的条件。目前，只有在处理复杂和不同查询语言方面具有专业知识的科学家才能有效地实现数据约简。在我们与科学家的讨论中，一个共同的主题是数据简化能力，类似于在易用性、可扩展性和结果的新鲜度/准确性方面的网络搜索，是一个可以大大提高科学研究的生产力和质量的关键需求。大多数现有的搜索工具都是为精确的字符串匹配而设计的，但是考虑到仪器产生的元数据的性质以及用户无法回忆起在非常大的数据集中搜索的精确数字，这种匹配是极不可能的。本文提出了在一系列值中定位感兴趣的元数据的研究。为了实现这一目标，我们在科学数据集的元数据描述中利用XML，特别是由SNS科学家生成的NeXus数据集。我们设计了一个可伸缩的索引结构来处理数据约简查询。还使用Web语义和基于本体的方法为最终用户提供优雅、直观和强大的基于自由格式查询的数据简化接口。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing

自引率

0.00%

发文量