PADS: an end-to-end system for processing ad hoc data

Mark Daly, Y. Mandelbaum, D. Walker, M. Fernández, Kathleen Fisher, R. Gruber, Xuan Zheng
{"title":"PADS: an end-to-end system for processing ad hoc data","authors":"Mark Daly, Y. Mandelbaum, D. Walker, M. Fernández, Kathleen Fisher, R. Gruber, Xuan Zheng","doi":"10.1145/1142473.1142568","DOIUrl":null,"url":null,"abstract":"Enormous amounts of data exist in \"well-behaved\" formats such as relational tables and XML, which come equipped with extensive tool support. However, vast amounts of data also exist in non-standard or ad hoc data formats, which often lack standard or extensible tools. This deficiency forces data analysts to implement their own tools for parsing, querying, and analyzing their ad hoc data. The resulting tools typically interleave parsing, querying, and analysis, obscuring the semantics of the data format and making it nearly impossible for others to resuse the tools. This proposal describes PADS, an end-to-end system for processing ad hoc data sources. The core of PADS is a declarative language for describing ad hoc data sources and a data-description compiler that produces customizable libraries for parsing the ad hoc data. A suite of tools built around this core includes statistical data-profiling tools, a query engine that permits viewing ad hoc sources as XML and for querying them with XQuery, and an interactive front-end that helps users produce PADS descriptions quickly.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"50 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1142473.1142568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Enormous amounts of data exist in "well-behaved" formats such as relational tables and XML, which come equipped with extensive tool support. However, vast amounts of data also exist in non-standard or ad hoc data formats, which often lack standard or extensible tools. This deficiency forces data analysts to implement their own tools for parsing, querying, and analyzing their ad hoc data. The resulting tools typically interleave parsing, querying, and analysis, obscuring the semantics of the data format and making it nearly impossible for others to resuse the tools. This proposal describes PADS, an end-to-end system for processing ad hoc data sources. The core of PADS is a declarative language for describing ad hoc data sources and a data-description compiler that produces customizable libraries for parsing the ad hoc data. A suite of tools built around this core includes statistical data-profiling tools, a query engine that permits viewing ad hoc sources as XML and for querying them with XQuery, and an interactive front-end that helps users produce PADS descriptions quickly.
PADS:用于处理特别数据的端到端系统
大量的数据以“行为良好”的格式存在,例如关系表和XML,它们配备了广泛的工具支持。然而,大量的数据也以非标准或临时数据格式存在,这些格式通常缺乏标准或可扩展的工具。这一缺陷迫使数据分析人员实现他们自己的工具来解析、查询和分析他们的特殊数据。产生的工具通常将解析、查询和分析交织在一起,模糊了数据格式的语义,使得其他人几乎不可能重用这些工具。这个建议描述了PADS,一个端到端处理特别数据源的系统。PADS的核心是一种用于描述临时数据源的声明性语言和一个数据描述编译器,该编译器生成用于解析临时数据的可定制库。围绕这个核心构建的一套工具包括统计数据分析工具、一个允许以XML形式查看特定源并使用XQuery进行查询的查询引擎,以及一个帮助用户快速生成PADS描述的交互式前端。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信