用Façade构造知识图：一种访问Web上异构数据源的统一方法

IF 4.1 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology Pub Date : 2022-11-04 DOI:10.1145/3555312

Luigi Asprino, E. Daga, Aldo Gangemi, P. Mulholland

{"title":"用Façade构造知识图：一种访问Web上异构数据源的统一方法","authors":"Luigi Asprino, E. Daga, Aldo Gangemi, P. Mulholland","doi":"10.1145/3555312","DOIUrl":null,"url":null,"abstract":"Data integration is the dominant use case for RDF Knowledge Graphs. However, Web resources come in formats with weak semantics (for example, CSV and JSON), or formats specific to a given application (for example, BibTex, HTML, and Markdown). To solve this problem, Knowledge Graph Construction (KGC) is gaining momentum due to its focus on supporting users in transforming data into RDF. However, using existing KGC frameworks result in complex data processing pipelines, which mix structural and semantic mappings, whose development and maintenance constitute a significant bottleneck for KG engineers. Such frameworks force users to rely on different tools, sometimes based on heterogeneous languages, for inspecting sources, designing mappings, and generating triples, thus making the process unnecessarily complicated. We argue that it is possible and desirable to equip KG engineers with the ability of interacting with Web data formats by relying on their expertise in RDF and the well-established SPARQL query language [2]. In this article, we study a unified method for data access to heterogeneous data sources with Facade-X, a meta-model implemented in a new data integration system called SPARQL Anything. We demonstrate that our approach is theoretically sound, since it allows a single meta-model, based on RDF, to represent data from (a) any file format expressible in BNF syntax, as well as (b) any relational database. We compare our method to state-of-the-art approaches in terms of usability (cognitive complexity of the mappings) and general performance. Finally, we discuss the benefits and challenges of this novel approach by engaging with the reference user community.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"23 1","pages":"1 - 31"},"PeriodicalIF":4.1000,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Knowledge Graph Construction with a Façade: A Unified Method to Access Heterogeneous Data Sources on the Web\",\"authors\":\"Luigi Asprino, E. Daga, Aldo Gangemi, P. Mulholland\",\"doi\":\"10.1145/3555312\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data integration is the dominant use case for RDF Knowledge Graphs. However, Web resources come in formats with weak semantics (for example, CSV and JSON), or formats specific to a given application (for example, BibTex, HTML, and Markdown). To solve this problem, Knowledge Graph Construction (KGC) is gaining momentum due to its focus on supporting users in transforming data into RDF. However, using existing KGC frameworks result in complex data processing pipelines, which mix structural and semantic mappings, whose development and maintenance constitute a significant bottleneck for KG engineers. Such frameworks force users to rely on different tools, sometimes based on heterogeneous languages, for inspecting sources, designing mappings, and generating triples, thus making the process unnecessarily complicated. We argue that it is possible and desirable to equip KG engineers with the ability of interacting with Web data formats by relying on their expertise in RDF and the well-established SPARQL query language [2]. In this article, we study a unified method for data access to heterogeneous data sources with Facade-X, a meta-model implemented in a new data integration system called SPARQL Anything. We demonstrate that our approach is theoretically sound, since it allows a single meta-model, based on RDF, to represent data from (a) any file format expressible in BNF syntax, as well as (b) any relational database. We compare our method to state-of-the-art approaches in terms of usability (cognitive complexity of the mappings) and general performance. Finally, we discuss the benefits and challenges of this novel approach by engaging with the reference user community.\",\"PeriodicalId\":50911,\"journal\":{\"name\":\"ACM Transactions on Internet Technology\",\"volume\":\"23 1\",\"pages\":\"1 - 31\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2022-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Internet Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3555312\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Internet Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3555312","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 14

摘要

数据集成是RDF知识图的主要用例。然而，Web资源的格式具有弱语义（例如，CSV和JSON），或者特定于给定应用程序的格式（例如，BibTex、HTML和Markdown）。为了解决这个问题，知识图构建（KGC）正因其专注于支持用户将数据转换为RDF而获得发展势头。然而，使用现有的KGC框架会产生复杂的数据处理管道，这些管道混合了结构和语义映射，其开发和维护对KG工程师来说是一个重要的瓶颈。这样的框架迫使用户依赖不同的工具，有时基于异构语言，来检查源代码、设计映射和生成三元组，从而使过程变得不必要地复杂。我们认为，依靠KG工程师在RDF和公认的SPARQL查询语言[2]方面的专业知识，让他们具备与Web数据格式交互的能力是可能的，也是可取的。在本文中，我们研究了一种使用Facade-X对异构数据源进行数据访问的统一方法，Facade-X是一种在名为SPARQLAnything的新数据集成系统中实现的元模型。我们证明了我们的方法在理论上是合理的，因为它允许基于RDF的单个元模型来表示（a）任何可以用BNF语法表达的文件格式的数据，以及（b）任何关系数据库的数据。在可用性（映射的认知复杂性）和一般性能方面，我们将我们的方法与最先进的方法进行了比较。最后，我们通过与参考用户社区的接触，讨论了这种新方法的好处和挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Knowledge Graph Construction with a Façade: A Unified Method to Access Heterogeneous Data Sources on the Web

Data integration is the dominant use case for RDF Knowledge Graphs. However, Web resources come in formats with weak semantics (for example, CSV and JSON), or formats specific to a given application (for example, BibTex, HTML, and Markdown). To solve this problem, Knowledge Graph Construction (KGC) is gaining momentum due to its focus on supporting users in transforming data into RDF. However, using existing KGC frameworks result in complex data processing pipelines, which mix structural and semantic mappings, whose development and maintenance constitute a significant bottleneck for KG engineers. Such frameworks force users to rely on different tools, sometimes based on heterogeneous languages, for inspecting sources, designing mappings, and generating triples, thus making the process unnecessarily complicated. We argue that it is possible and desirable to equip KG engineers with the ability of interacting with Web data formats by relying on their expertise in RDF and the well-established SPARQL query language [2]. In this article, we study a unified method for data access to heterogeneous data sources with Facade-X, a meta-model implemented in a new data integration system called SPARQL Anything. We demonstrate that our approach is theoretically sound, since it allows a single meta-model, based on RDF, to represent data from (a) any file format expressible in BNF syntax, as well as (b) any relational database. We compare our method to state-of-the-art approaches in terms of usability (cognitive complexity of the mappings) and general performance. Finally, we discuss the benefits and challenges of this novel approach by engaging with the reference user community.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Internet Technology 工程技术-计算机：软件工程

CiteScore

10.30

自引率

1.90%

发文量

137

审稿时长

>12 weeks

期刊介绍： ACM Transactions on Internet Technology (TOIT) brings together many computing disciplines including computer software engineering, computer programming languages, middleware, database management, security, knowledge discovery and data mining, networking and distributed systems, communications, performance and scalability etc. TOIT will cover the results and roles of the individual disciplines and the relationshipsamong them.