PAPAYA:基于 SQL 的 RDF 处理系统性能分析库

Semantic Web Pub Date : 2024-04-05 DOI:10.3233/sw-243582
Mohamed Ragab, Adam Satria Adidarma, Riccardo Tommasini
{"title":"PAPAYA:基于 SQL 的 RDF 处理系统性能分析库","authors":"Mohamed Ragab, Adam Satria Adidarma, Riccardo Tommasini","doi":"10.3233/sw-243582","DOIUrl":null,"url":null,"abstract":"Prescriptive Performance Analysis (PPA) has shown to be more useful than traditional descriptive and diagnostic analyses for making sense of Big Data (BD) frameworks’ performance. In practice, when processing large (RDF) graphs on top of relational BD systems, several design decisions emerge and cannot be decided automatically, e.g., the choice of the schema, the partitioning technique, and the storage formats. PPA, and in particular ranking functions, helps enable actionable insights on performance data, leading practitioners to an easier choice of the best way to deploy BD frameworks, especially for graph processing. However, the amount of experimental work required to implement PPA is still huge. In this paper, we present PAPAYA,11 https://github.com/DataSystemsGroupUT/PAPyA a library for implementing PPA that allows (1) preparing RDF graphs data for a processing pipeline over relational BD systems, (2) enables automatic ranking of the performance in a user-defined solution space of experimental dimensions; (3) allows user-defined flexible extensions in terms of systems to test and ranking methods. We showcase PAPAYA on a set of experiments based on the SparkSQL framework. PAPAYA simplifies the performance analytics of BD systems for processing large (RDF) graphs. We provide PAPAYA as a public open-source library under an MIT license that will be a catalyst for designing new research prescriptive analytical techniques for BD applications.","PeriodicalId":506307,"journal":{"name":"Semantic Web","volume":"165 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PAPAYA: A library for performance analysis of SQL-based RDF processing systems\",\"authors\":\"Mohamed Ragab, Adam Satria Adidarma, Riccardo Tommasini\",\"doi\":\"10.3233/sw-243582\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Prescriptive Performance Analysis (PPA) has shown to be more useful than traditional descriptive and diagnostic analyses for making sense of Big Data (BD) frameworks’ performance. In practice, when processing large (RDF) graphs on top of relational BD systems, several design decisions emerge and cannot be decided automatically, e.g., the choice of the schema, the partitioning technique, and the storage formats. PPA, and in particular ranking functions, helps enable actionable insights on performance data, leading practitioners to an easier choice of the best way to deploy BD frameworks, especially for graph processing. However, the amount of experimental work required to implement PPA is still huge. In this paper, we present PAPAYA,11 https://github.com/DataSystemsGroupUT/PAPyA a library for implementing PPA that allows (1) preparing RDF graphs data for a processing pipeline over relational BD systems, (2) enables automatic ranking of the performance in a user-defined solution space of experimental dimensions; (3) allows user-defined flexible extensions in terms of systems to test and ranking methods. We showcase PAPAYA on a set of experiments based on the SparkSQL framework. PAPAYA simplifies the performance analytics of BD systems for processing large (RDF) graphs. We provide PAPAYA as a public open-source library under an MIT license that will be a catalyst for designing new research prescriptive analytical techniques for BD applications.\",\"PeriodicalId\":506307,\"journal\":{\"name\":\"Semantic Web\",\"volume\":\"165 3\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Semantic Web\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/sw-243582\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Semantic Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/sw-243582","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

与传统的描述性和诊断性分析相比,描述性性能分析(PPA)更有助于了解大数据(BD)框架的性能。实际上,在关系型 BD 系统之上处理大型(RDF)图时,会出现一些无法自动决定的设计决策,例如模式的选择、分区技术和存储格式。PPA,尤其是排名功能,有助于对性能数据进行可操作的深入分析,从而使从业人员更容易选择部署 BD 框架的最佳方式,尤其是在图形处理方面。然而,实现 PPA 所需的实验工作量仍然巨大。在本文中,我们介绍了PAPAYA,11 https://github.com/DataSystemsGroupUT/PAPyA 一个用于实现PPA的库,该库允许:(1)为关系型BD系统上的处理管道准备RDF图数据;(2)在用户定义的实验维度解决方案空间中自动进行性能排序;(3)允许用户在测试系统和排序方法方面进行灵活的扩展。我们在一组基于 SparkSQL 框架的实验中展示了 PAPAYA。PAPAYA 简化了处理大型(RDF)图的 BD 系统的性能分析。我们将 PAPAYA 作为 MIT 许可下的公共开源库提供,它将成为为 BD 应用程序设计新的研究规范性分析技术的催化剂。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
PAPAYA: A library for performance analysis of SQL-based RDF processing systems
Prescriptive Performance Analysis (PPA) has shown to be more useful than traditional descriptive and diagnostic analyses for making sense of Big Data (BD) frameworks’ performance. In practice, when processing large (RDF) graphs on top of relational BD systems, several design decisions emerge and cannot be decided automatically, e.g., the choice of the schema, the partitioning technique, and the storage formats. PPA, and in particular ranking functions, helps enable actionable insights on performance data, leading practitioners to an easier choice of the best way to deploy BD frameworks, especially for graph processing. However, the amount of experimental work required to implement PPA is still huge. In this paper, we present PAPAYA,11 https://github.com/DataSystemsGroupUT/PAPyA a library for implementing PPA that allows (1) preparing RDF graphs data for a processing pipeline over relational BD systems, (2) enables automatic ranking of the performance in a user-defined solution space of experimental dimensions; (3) allows user-defined flexible extensions in terms of systems to test and ranking methods. We showcase PAPAYA on a set of experiments based on the SparkSQL framework. PAPAYA simplifies the performance analytics of BD systems for processing large (RDF) graphs. We provide PAPAYA as a public open-source library under an MIT license that will be a catalyst for designing new research prescriptive analytical techniques for BD applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信