Automated mapping between SDG indicators and open data: An LLM-augmented knowledge graph approach

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering Pub Date : 2025-01-03 DOI:10.1016/j.datak.2024.102405

Wissal Benjira , Faten Atigui , Bénédicte Bucher , Malika Grim-Yefsah , Nicolas Travers

{"title":"Automated mapping between SDG indicators and open data: An LLM-augmented knowledge graph approach","authors":"Wissal Benjira , Faten Atigui , Bénédicte Bucher , Malika Grim-Yefsah , Nicolas Travers","doi":"10.1016/j.datak.2024.102405","DOIUrl":null,"url":null,"abstract":"<div><div>Meeting the Sustainable Development Goals (SDGs) presents a large-scale challenge for all countries. SDGs established by the United Nations provide a comprehensive framework for addressing global issues. To monitor progress towards these goals, we need to develop key performance indicators and integrate and analyze heterogeneous datasets. The definition of these indicators requires the use of existing data and metadata. However, the diversity of data sources and formats raises major issues in terms of structuring and integration. Despite the abundance of open data and metadata, its exploitation remains limited, leaving untapped potential for guiding urban policies towards sustainability. Thus, this paper introduces a novel approach for SDG indicator computation, leveraging the capabilities of Large Language Models (LLMs) and Knowledge Graphs (KGs). We propose a method that combines rule-based filtering with LLM-powered schema mapping to establish semantic correspondences between diverse data sources and SDG indicators, including disaggregation. Our approach integrates these mappings into a KG, which enables indicator computation by querying graph’s topology. We evaluate our method through a case study focusing on the SDG Indicator 11.7.1 about accessibility of public open spaces. Our experimental results show significant improvements in accuracy, precision, recall, and F1-score compared to traditional schema mapping techniques.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102405"},"PeriodicalIF":2.7000,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X24001290","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Meeting the Sustainable Development Goals (SDGs) presents a large-scale challenge for all countries. SDGs established by the United Nations provide a comprehensive framework for addressing global issues. To monitor progress towards these goals, we need to develop key performance indicators and integrate and analyze heterogeneous datasets. The definition of these indicators requires the use of existing data and metadata. However, the diversity of data sources and formats raises major issues in terms of structuring and integration. Despite the abundance of open data and metadata, its exploitation remains limited, leaving untapped potential for guiding urban policies towards sustainability. Thus, this paper introduces a novel approach for SDG indicator computation, leveraging the capabilities of Large Language Models (LLMs) and Knowledge Graphs (KGs). We propose a method that combines rule-based filtering with LLM-powered schema mapping to establish semantic correspondences between diverse data sources and SDG indicators, including disaggregation. Our approach integrates these mappings into a KG, which enables indicator computation by querying graph’s topology. We evaluate our method through a case study focusing on the SDG Indicator 11.7.1 about accessibility of public open spaces. Our experimental results show significant improvements in accuracy, precision, recall, and F1-score compared to traditional schema mapping techniques.

查看原文本刊更多论文

可持续发展目标指标和开放数据之间的自动映射：法学硕士增强的知识图谱方法

实现可持续发展目标对所有国家来说都是一个巨大的挑战。联合国制定的可持续发展目标为解决全球问题提供了一个全面的框架。为了监测实现这些目标的进展情况，我们需要制定关键绩效指标，并整合和分析异构数据集。这些指标的定义需要使用现有数据和元数据。但是，数据源和格式的多样性在结构和集成方面提出了重大问题。尽管开放数据和元数据丰富，但其开发仍然有限，在指导城市政策走向可持续性方面尚未开发潜力。因此，本文引入了一种利用大型语言模型（LLMs）和知识图（KGs）的能力来计算可持续发展目标指标的新方法。我们提出了一种将基于规则的过滤与基于llm的模式映射相结合的方法，以建立不同数据源与可持续发展目标指标之间的语义对应关系，包括分解。我们的方法将这些映射集成到一个KG中，通过查询图的拓扑来实现指示器计算。我们通过一个案例研究来评估我们的方法，重点关注可持续发展目标指标11.7.1关于公共开放空间的可达性。我们的实验结果表明，与传统的模式映射技术相比，在准确性、精密度、召回率和f1得分方面有了显著的提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data & Knowledge Engineering 工程技术-计算机：人工智能

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

6 months

期刊介绍： Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.