Data catalog tools: A systematic multivocal literature review

IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Marco Tonnarelli , Indika Kumara , Stefan Driessen , Damian Andrew Tamburri , Willem-Jan van den Heuvel , Patrick Oor
{"title":"Data catalog tools: A systematic multivocal literature review","authors":"Marco Tonnarelli ,&nbsp;Indika Kumara ,&nbsp;Stefan Driessen ,&nbsp;Damian Andrew Tamburri ,&nbsp;Willem-Jan van den Heuvel ,&nbsp;Patrick Oor","doi":"10.1016/j.jss.2025.112584","DOIUrl":null,"url":null,"abstract":"<div><div>A data catalog enables an organization to maintain an inventory of its data assets by collecting and managing the relevant metadata. We conducted a systematic multi-vocal literature review on data catalogs to understand their features and usage. We systematically selected and analyzed 86 literature sources and 39 catalog tools. We first utilized the findings from the literature to develop a classification framework comprising 24 fine-grained and five high-level features, along with three maturity levels. Next, we analyzed 39 tools based on the classification framework. Organizations typically include a data catalog as a component in their big data platforms and use it to support the various phases of the metadata management lifecycle. Hence, we also mapped the catalog features to the requirements of metadata-driven big data architectures, namely data mesh, data lake, and data lakehouse. Moreover, the mappings of the features to the phases in a metadata management lifecycle were developed. Our findings shall aid organizations in making informed decisions when choosing data catalog tools and help researchers identify the critical research issues in data cataloging and metadata management.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112584"},"PeriodicalIF":4.1000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225002535","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

A data catalog enables an organization to maintain an inventory of its data assets by collecting and managing the relevant metadata. We conducted a systematic multi-vocal literature review on data catalogs to understand their features and usage. We systematically selected and analyzed 86 literature sources and 39 catalog tools. We first utilized the findings from the literature to develop a classification framework comprising 24 fine-grained and five high-level features, along with three maturity levels. Next, we analyzed 39 tools based on the classification framework. Organizations typically include a data catalog as a component in their big data platforms and use it to support the various phases of the metadata management lifecycle. Hence, we also mapped the catalog features to the requirements of metadata-driven big data architectures, namely data mesh, data lake, and data lakehouse. Moreover, the mappings of the features to the phases in a metadata management lifecycle were developed. Our findings shall aid organizations in making informed decisions when choosing data catalog tools and help researchers identify the critical research issues in data cataloging and metadata management.
Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board.
数据目录工具:系统的多语种文献综述
数据目录使组织能够通过收集和管理相关元数据来维护其数据资产的清单。我们对数据目录进行了系统的多语音文献综述,以了解它们的特点和用途。我们系统地选择和分析了86个文献来源和39个目录工具。我们首先利用文献中的发现开发了一个包含24个细粒度和5个高级特征以及3个成熟度级别的分类框架。接下来,我们基于分类框架对39种工具进行了分析。组织通常在其大数据平台中包含数据目录作为组件,并使用它来支持元数据管理生命周期的各个阶段。因此,我们还将目录特性映射到元数据驱动的大数据架构的需求,即数据网格、数据湖和数据湖。此外,还开发了特性到元数据管理生命周期各阶段的映射。我们的研究结果将帮助组织在选择数据编目工具时做出明智的决定,并帮助研究人员确定数据编目和元数据管理中的关键研究问题。编者注:开放科学材料由系统与软件开放科学委员会杂志验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Systems and Software
Journal of Systems and Software 工程技术-计算机:理论方法
CiteScore
8.60
自引率
5.70%
发文量
193
审稿时长
16 weeks
期刊介绍: The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信