ArcaDB:异构计算环境的分类查询引擎。

Kristalys Ruiz-Rohena, Manuel Rodriguez-Martínez
{"title":"ArcaDB:异构计算环境的分类查询引擎。","authors":"Kristalys Ruiz-Rohena, Manuel Rodriguez-Martínez","doi":"10.1109/cloud62652.2024.00015","DOIUrl":null,"url":null,"abstract":"<p><p>Modern enterprises rely on data management systems to collect, store, and analyze vast amounts of data related to their operations. Nowadays, clusters and hardware accelerators (e.g., GPUs, TPUs) have become a necessity to scale with the data processing demands in many applications related to social media, bioinformatics, surveillance systems, remote sensing, and medical informatics. Given this new scenario, the architecture of data analytics engines must evolve to take advantage of these new technological trends. In this paper, we present ArcaDB: a disaggregated query engine that leverages container technology to place operators at compute nodes that fit their performance profile. In ArcaDB, a query plan is dispatched to worker nodes that have different computing characteristics. Each operator is annotated with the preferred type of compute node for execution, and ArcaDB ensures that the operator gets picked up by the appropriate workers. We have implemented a prototype version of ArcaDB using Java, Python, and Docker containers. We have also completed a preliminary performance study of this prototype, using images and scientific data. This study shows that ArcaDB can speed up query performance by a factor of 3.5x in comparison with a shared-nothing, symmetric arrangement.</p>","PeriodicalId":93366,"journal":{"name":"Proceedings. IEEE International Conference on Cloud Computing","volume":"2024 ","pages":"42-53"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529753/pdf/","citationCount":"0","resultStr":"{\"title\":\"ArcaDB: A Disaggregated Query Engine for Heterogenous Computational Environments.\",\"authors\":\"Kristalys Ruiz-Rohena, Manuel Rodriguez-Martínez\",\"doi\":\"10.1109/cloud62652.2024.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Modern enterprises rely on data management systems to collect, store, and analyze vast amounts of data related to their operations. Nowadays, clusters and hardware accelerators (e.g., GPUs, TPUs) have become a necessity to scale with the data processing demands in many applications related to social media, bioinformatics, surveillance systems, remote sensing, and medical informatics. Given this new scenario, the architecture of data analytics engines must evolve to take advantage of these new technological trends. In this paper, we present ArcaDB: a disaggregated query engine that leverages container technology to place operators at compute nodes that fit their performance profile. In ArcaDB, a query plan is dispatched to worker nodes that have different computing characteristics. Each operator is annotated with the preferred type of compute node for execution, and ArcaDB ensures that the operator gets picked up by the appropriate workers. We have implemented a prototype version of ArcaDB using Java, Python, and Docker containers. We have also completed a preliminary performance study of this prototype, using images and scientific data. This study shows that ArcaDB can speed up query performance by a factor of 3.5x in comparison with a shared-nothing, symmetric arrangement.</p>\",\"PeriodicalId\":93366,\"journal\":{\"name\":\"Proceedings. IEEE International Conference on Cloud Computing\",\"volume\":\"2024 \",\"pages\":\"42-53\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11529753/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE International Conference on Cloud Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/cloud62652.2024.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/8/28 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/cloud62652.2024.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/28 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

现代企业依赖数据管理系统来收集、存储和分析与其运营相关的大量数据。如今,集群和硬件加速器(如 GPU、TPU)已成为许多与社交媒体、生物信息学、监控系统、遥感和医疗信息学相关的应用扩展数据处理需求的必需品。在这种新形势下,数据分析引擎的架构必须不断发展,以利用这些新的技术趋势。在本文中,我们介绍了 ArcaDB:一种分解查询引擎,它利用容器技术将操作员放置在适合其性能配置的计算节点上。在 ArcaDB 中,查询计划被分派到具有不同计算特性的工作节点上。每个运算符都会被注释为执行时首选的计算节点类型,ArcaDB 会确保运算符被相应的工作节点接收。我们使用 Java、Python 和 Docker 容器实现了 ArcaDB 的原型版本。我们还使用图像和科学数据完成了对该原型的初步性能研究。这项研究表明,与无共享的对称安排相比,ArcaDB 可将查询性能提高 3.5 倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ArcaDB: A Disaggregated Query Engine for Heterogenous Computational Environments.

Modern enterprises rely on data management systems to collect, store, and analyze vast amounts of data related to their operations. Nowadays, clusters and hardware accelerators (e.g., GPUs, TPUs) have become a necessity to scale with the data processing demands in many applications related to social media, bioinformatics, surveillance systems, remote sensing, and medical informatics. Given this new scenario, the architecture of data analytics engines must evolve to take advantage of these new technological trends. In this paper, we present ArcaDB: a disaggregated query engine that leverages container technology to place operators at compute nodes that fit their performance profile. In ArcaDB, a query plan is dispatched to worker nodes that have different computing characteristics. Each operator is annotated with the preferred type of compute node for execution, and ArcaDB ensures that the operator gets picked up by the appropriate workers. We have implemented a prototype version of ArcaDB using Java, Python, and Docker containers. We have also completed a preliminary performance study of this prototype, using images and scientific data. This study shows that ArcaDB can speed up query performance by a factor of 3.5x in comparison with a shared-nothing, symmetric arrangement.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信