Web数据搜索的模式级索引模型

J. Data Intell. Pub Date : 2021-03-01 DOI:10.26421/JDI2.1-3

A. Scherp, Till Blume

{"title":"Web数据搜索的模式级索引模型","authors":"A. Scherp, Till Blume","doi":"10.26421/JDI2.1-3","DOIUrl":null,"url":null,"abstract":"Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various efforts have been conducted to develop specific index models for a given task. With each index model designed, implemented, and evaluated independently, it remains difficult to judge whether an approach generalizes well to another task, set of queries, or dataset. In this work, we empirically evaluate six representative index models with unique feature combinations. Among them is a new index model incorporating inferencing over RDFS and \\texttt{owl:sameAs}. We implement all index models for the first time into a single, stream-based framework. We evaluate variations of the index models considering sub-graphs of size $0$, $1$, and $2$ hops on two large, real-world datasets. We evaluate the quality of the indices regarding the compression ratio, summarization ratio, and F1-score denoting the approximation quality of the stream-based index computation. The experiments reveal huge variations in compression ratio, summarization ratio, and approximation quality for different index models, queries, and datasets. However, we observe meaningful correlations in the results that help to determine the right index model for a given task, type of query, and dataset.","PeriodicalId":232625,"journal":{"name":"J. Data Intell.","volume":"362 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Schema-level Index Models for Web Data Search\",\"authors\":\"A. Scherp, Till Blume\",\"doi\":\"10.26421/JDI2.1-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various efforts have been conducted to develop specific index models for a given task. With each index model designed, implemented, and evaluated independently, it remains difficult to judge whether an approach generalizes well to another task, set of queries, or dataset. In this work, we empirically evaluate six representative index models with unique feature combinations. Among them is a new index model incorporating inferencing over RDFS and \\\\texttt{owl:sameAs}. We implement all index models for the first time into a single, stream-based framework. We evaluate variations of the index models considering sub-graphs of size $0$, $1$, and $2$ hops on two large, real-world datasets. We evaluate the quality of the indices regarding the compression ratio, summarization ratio, and F1-score denoting the approximation quality of the stream-based index computation. The experiments reveal huge variations in compression ratio, summarization ratio, and approximation quality for different index models, queries, and datasets. However, we observe meaningful correlations in the results that help to determine the right index model for a given task, type of query, and dataset.\",\"PeriodicalId\":232625,\"journal\":{\"name\":\"J. Data Intell.\",\"volume\":\"362 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Data Intell.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26421/JDI2.1-3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Data Intell.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26421/JDI2.1-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

索引数据网络提供了许多机会，特别是查找和探索数据源的机会。在为Web of Data建立索引时，一个主要的设计决策是找到合适的索引模型，即如何对数据进行索引和汇总。已经进行了各种努力，为给定任务开发特定的索引模型。由于每个索引模型都是独立设计、实现和评估的，因此很难判断一种方法是否可以很好地泛化到另一个任务、查询集或数据集。在这项工作中，我们对六个具有独特特征组合的代表性指数模型进行了实证评估。其中一种新的索引模型结合了基于RDFS和\texttt{owl:sameAs}的推理。我们首次将所有索引模型实现到一个单一的、基于流的框架中。我们在两个大型的真实数据集上考虑大小为$0$、$1$和$2$的子图来评估索引模型的变化。我们根据压缩比、汇总比和f1分数来评估索引的质量，f1分数表示基于流的索引计算的近似质量。实验揭示了不同索引模型、查询和数据集在压缩比、汇总比和近似质量上的巨大差异。然而，我们在结果中观察到有意义的相关性，这有助于为给定任务、查询类型和数据集确定正确的索引模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Schema-level Index Models for Web Data Search

Indexing the Web of Data offers many opportunities, in particular, to find and explore data sources. One major design decision when indexing the Web of Data is to find a suitable index model, i.e., how to index and summarize data. Various efforts have been conducted to develop specific index models for a given task. With each index model designed, implemented, and evaluated independently, it remains difficult to judge whether an approach generalizes well to another task, set of queries, or dataset. In this work, we empirically evaluate six representative index models with unique feature combinations. Among them is a new index model incorporating inferencing over RDFS and \texttt{owl:sameAs}. We implement all index models for the first time into a single, stream-based framework. We evaluate variations of the index models considering sub-graphs of size $0$, $1$, and $2$ hops on two large, real-world datasets. We evaluate the quality of the indices regarding the compression ratio, summarization ratio, and F1-score denoting the approximation quality of the stream-based index computation. The experiments reveal huge variations in compression ratio, summarization ratio, and approximation quality for different index models, queries, and datasets. However, we observe meaningful correlations in the results that help to determine the right index model for a given task, type of query, and dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Data Intell.

自引率

0.00%

发文量