Marco Tonnarelli , Indika Kumara , Stefan Driessen , Damian Andrew Tamburri , Willem-Jan van den Heuvel , Patrick Oor
{"title":"数据目录工具:系统的多语种文献综述","authors":"Marco Tonnarelli , Indika Kumara , Stefan Driessen , Damian Andrew Tamburri , Willem-Jan van den Heuvel , Patrick Oor","doi":"10.1016/j.jss.2025.112584","DOIUrl":null,"url":null,"abstract":"<div><div>A data catalog enables an organization to maintain an inventory of its data assets by collecting and managing the relevant metadata. We conducted a systematic multi-vocal literature review on data catalogs to understand their features and usage. We systematically selected and analyzed 86 literature sources and 39 catalog tools. We first utilized the findings from the literature to develop a classification framework comprising 24 fine-grained and five high-level features, along with three maturity levels. Next, we analyzed 39 tools based on the classification framework. Organizations typically include a data catalog as a component in their big data platforms and use it to support the various phases of the metadata management lifecycle. Hence, we also mapped the catalog features to the requirements of metadata-driven big data architectures, namely data mesh, data lake, and data lakehouse. Moreover, the mappings of the features to the phases in a metadata management lifecycle were developed. Our findings shall aid organizations in making informed decisions when choosing data catalog tools and help researchers identify the critical research issues in data cataloging and metadata management.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112584"},"PeriodicalIF":4.1000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data catalog tools: A systematic multivocal literature review\",\"authors\":\"Marco Tonnarelli , Indika Kumara , Stefan Driessen , Damian Andrew Tamburri , Willem-Jan van den Heuvel , Patrick Oor\",\"doi\":\"10.1016/j.jss.2025.112584\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>A data catalog enables an organization to maintain an inventory of its data assets by collecting and managing the relevant metadata. We conducted a systematic multi-vocal literature review on data catalogs to understand their features and usage. We systematically selected and analyzed 86 literature sources and 39 catalog tools. We first utilized the findings from the literature to develop a classification framework comprising 24 fine-grained and five high-level features, along with three maturity levels. Next, we analyzed 39 tools based on the classification framework. Organizations typically include a data catalog as a component in their big data platforms and use it to support the various phases of the metadata management lifecycle. Hence, we also mapped the catalog features to the requirements of metadata-driven big data architectures, namely data mesh, data lake, and data lakehouse. Moreover, the mappings of the features to the phases in a metadata management lifecycle were developed. Our findings shall aid organizations in making informed decisions when choosing data catalog tools and help researchers identify the critical research issues in data cataloging and metadata management.</div><div><em>Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board</em>.</div></div>\",\"PeriodicalId\":51099,\"journal\":{\"name\":\"Journal of Systems and Software\",\"volume\":\"230 \",\"pages\":\"Article 112584\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-08-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems and Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0164121225002535\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225002535","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Data catalog tools: A systematic multivocal literature review
A data catalog enables an organization to maintain an inventory of its data assets by collecting and managing the relevant metadata. We conducted a systematic multi-vocal literature review on data catalogs to understand their features and usage. We systematically selected and analyzed 86 literature sources and 39 catalog tools. We first utilized the findings from the literature to develop a classification framework comprising 24 fine-grained and five high-level features, along with three maturity levels. Next, we analyzed 39 tools based on the classification framework. Organizations typically include a data catalog as a component in their big data platforms and use it to support the various phases of the metadata management lifecycle. Hence, we also mapped the catalog features to the requirements of metadata-driven big data architectures, namely data mesh, data lake, and data lakehouse. Moreover, the mappings of the features to the phases in a metadata management lifecycle were developed. Our findings shall aid organizations in making informed decisions when choosing data catalog tools and help researchers identify the critical research issues in data cataloging and metadata management.
Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board.
期刊介绍:
The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to:
•Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution
•Agile, model-driven, service-oriented, open source and global software development
•Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems
•Human factors and management concerns of software development
•Data management and big data issues of software systems
•Metrics and evaluation, data mining of software development resources
•Business and economic aspects of software development processes
The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.