PAPAYA：基于 SQL 的 RDF 处理系统性能分析库

Semantic Web Pub Date : 2024-04-05 DOI:10.3233/sw-243582

Mohamed Ragab, Adam Satria Adidarma, Riccardo Tommasini

{"title":"PAPAYA：基于 SQL 的 RDF 处理系统性能分析库","authors":"Mohamed Ragab, Adam Satria Adidarma, Riccardo Tommasini","doi":"10.3233/sw-243582","DOIUrl":null,"url":null,"abstract":"Prescriptive Performance Analysis (PPA) has shown to be more useful than traditional descriptive and diagnostic analyses for making sense of Big Data (BD) frameworks’ performance. In practice, when processing large (RDF) graphs on top of relational BD systems, several design decisions emerge and cannot be decided automatically, e.g., the choice of the schema, the partitioning technique, and the storage formats. PPA, and in particular ranking functions, helps enable actionable insights on performance data, leading practitioners to an easier choice of the best way to deploy BD frameworks, especially for graph processing. However, the amount of experimental work required to implement PPA is still huge. In this paper, we present PAPAYA,11 https://github.com/DataSystemsGroupUT/PAPyA a library for implementing PPA that allows (1) preparing RDF graphs data for a processing pipeline over relational BD systems, (2) enables automatic ranking of the performance in a user-defined solution space of experimental dimensions; (3) allows user-defined flexible extensions in terms of systems to test and ranking methods. We showcase PAPAYA on a set of experiments based on the SparkSQL framework. PAPAYA simplifies the performance analytics of BD systems for processing large (RDF) graphs. We provide PAPAYA as a public open-source library under an MIT license that will be a catalyst for designing new research prescriptive analytical techniques for BD applications.","PeriodicalId":506307,"journal":{"name":"Semantic Web","volume":"165 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PAPAYA: A library for performance analysis of SQL-based RDF processing systems\",\"authors\":\"Mohamed Ragab, Adam Satria Adidarma, Riccardo Tommasini\",\"doi\":\"10.3233/sw-243582\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Prescriptive Performance Analysis (PPA) has shown to be more useful than traditional descriptive and diagnostic analyses for making sense of Big Data (BD) frameworks’ performance. In practice, when processing large (RDF) graphs on top of relational BD systems, several design decisions emerge and cannot be decided automatically, e.g., the choice of the schema, the partitioning technique, and the storage formats. PPA, and in particular ranking functions, helps enable actionable insights on performance data, leading practitioners to an easier choice of the best way to deploy BD frameworks, especially for graph processing. However, the amount of experimental work required to implement PPA is still huge. In this paper, we present PAPAYA,11 https://github.com/DataSystemsGroupUT/PAPyA a library for implementing PPA that allows (1) preparing RDF graphs data for a processing pipeline over relational BD systems, (2) enables automatic ranking of the performance in a user-defined solution space of experimental dimensions; (3) allows user-defined flexible extensions in terms of systems to test and ranking methods. We showcase PAPAYA on a set of experiments based on the SparkSQL framework. PAPAYA simplifies the performance analytics of BD systems for processing large (RDF) graphs. We provide PAPAYA as a public open-source library under an MIT license that will be a catalyst for designing new research prescriptive analytical techniques for BD applications.\",\"PeriodicalId\":506307,\"journal\":{\"name\":\"Semantic Web\",\"volume\":\"165 3\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Semantic Web\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/sw-243582\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Semantic Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/sw-243582","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

与传统的描述性和诊断性分析相比，描述性性能分析（PPA）更有助于了解大数据（BD）框架的性能。实际上，在关系型 BD 系统之上处理大型（RDF）图时，会出现一些无法自动决定的设计决策，例如模式的选择、分区技术和存储格式。PPA，尤其是排名功能，有助于对性能数据进行可操作的深入分析，从而使从业人员更容易选择部署 BD 框架的最佳方式，尤其是在图形处理方面。然而，实现 PPA 所需的实验工作量仍然巨大。在本文中，我们介绍了PAPAYA,11 https://github.com/DataSystemsGroupUT/PAPyA 一个用于实现PPA的库，该库允许：（1）为关系型BD系统上的处理管道准备RDF图数据；（2）在用户定义的实验维度解决方案空间中自动进行性能排序；（3）允许用户在测试系统和排序方法方面进行灵活的扩展。我们在一组基于 SparkSQL 框架的实验中展示了 PAPAYA。PAPAYA 简化了处理大型（RDF）图的 BD 系统的性能分析。我们将 PAPAYA 作为 MIT 许可下的公共开源库提供，它将成为为 BD 应用程序设计新的研究规范性分析技术的催化剂。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PAPAYA: A library for performance analysis of SQL-based RDF processing systems

Prescriptive Performance Analysis (PPA) has shown to be more useful than traditional descriptive and diagnostic analyses for making sense of Big Data (BD) frameworks’ performance. In practice, when processing large (RDF) graphs on top of relational BD systems, several design decisions emerge and cannot be decided automatically, e.g., the choice of the schema, the partitioning technique, and the storage formats. PPA, and in particular ranking functions, helps enable actionable insights on performance data, leading practitioners to an easier choice of the best way to deploy BD frameworks, especially for graph processing. However, the amount of experimental work required to implement PPA is still huge. In this paper, we present PAPAYA,11 https://github.com/DataSystemsGroupUT/PAPyA a library for implementing PPA that allows (1) preparing RDF graphs data for a processing pipeline over relational BD systems, (2) enables automatic ranking of the performance in a user-defined solution space of experimental dimensions; (3) allows user-defined flexible extensions in terms of systems to test and ranking methods. We showcase PAPAYA on a set of experiments based on the SparkSQL framework. PAPAYA simplifies the performance analytics of BD systems for processing large (RDF) graphs. We provide PAPAYA as a public open-source library under an MIT license that will be a catalyst for designing new research prescriptive analytical techniques for BD applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Semantic Web

自引率

0.00%

发文量