Auditable and reusable crosswalks for fast, scaled integration of scattered tabular data

Gavin Chait
{"title":"Auditable and reusable crosswalks for fast, scaled integration of scattered tabular data","authors":"Gavin Chait","doi":"arxiv-2409.01517","DOIUrl":null,"url":null,"abstract":"This paper presents an open-source curatorial toolkit intended to produce\nwell-structured and interoperable data. Curation is divided into discrete\ncomponents, with a schema-centric focus for auditable restructuring of complex\nand scattered tabular data to conform to a destination schema. Task separation\nallows development of software and analysis without source data being present.\nTransformations are captured as high-level sequential scripts describing\nschema-to-schema mappings, reducing complexity and resource requirements.\nUltimately, data are transformed, but the objective is that any data meeting a\nschema definition can be restructured using a crosswalk. The toolkit is\navailable both as a Python package, and as a 'no-code' visual web application.\nA visual example is presented, derived from a longitudinal study where\nscattered source data from hundreds of local councils are integrated into a\nsingle database.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper presents an open-source curatorial toolkit intended to produce well-structured and interoperable data. Curation is divided into discrete components, with a schema-centric focus for auditable restructuring of complex and scattered tabular data to conform to a destination schema. Task separation allows development of software and analysis without source data being present. Transformations are captured as high-level sequential scripts describing schema-to-schema mappings, reducing complexity and resource requirements. Ultimately, data are transformed, but the objective is that any data meeting a schema definition can be restructured using a crosswalk. The toolkit is available both as a Python package, and as a 'no-code' visual web application. A visual example is presented, derived from a longitudinal study where scattered source data from hundreds of local councils are integrated into a single database.
可审计、可重复使用的横道图,可快速、按比例整合分散的表格数据
本文介绍了一个旨在生成结构良好、可互操作的数据的开放源代码编辑工具包。数据整理被划分为不同的组成部分,以模式为中心,对复杂而分散的表格数据进行可审计的重组,使其符合目标模式。任务分离允许在不存在源数据的情况下开发软件和进行分析。转换以描述模式到模式映射的高级顺序脚本的形式进行,从而降低了复杂性和资源需求。最终,数据将被转换,但目标是任何符合模式定义的数据都可以使用横道图进行重组。该工具包既可以作为 Python 软件包提供,也可以作为 "无代码 "可视化网络应用程序提供。本文介绍了一个可视化示例,该示例来自一项纵向研究,研究将数百个地方议会的零散源数据整合到同一个数据库中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信