为支持gpu的分析查询评估扩展核心UDF框架

Qiming Chen, R. Wu, M. Hsu, Bin Zhang
{"title":"为支持gpu的分析查询评估扩展核心UDF框架","authors":"Qiming Chen, R. Wu, M. Hsu, Bin Zhang","doi":"10.1145/2076623.2076641","DOIUrl":null,"url":null,"abstract":"To achieve scalable data intensive analytics, we investigate methods to integrate general purpose analytic computation into a query pipeline using User Defined Functions (UDFs). However, an existing UDF cannot act as a block operator with chunk-wise input along the tuple-wise query processing pipeline, therefore unable to deal with the application semantics definable on the set of incoming tuples representing a single object or falling in a time window, and unable to leverage external computation engines for efficient batch processing.\n To enable the data intensive computation pipeline, we introduce a new kind of UDFs called Set-In Set-Out (SISO) UDFs. A SISO UDF is a block operator for processing the input tuples and returning the resulting tuples chunk by chunk. Operated in the query processing pipeline, a SISO UDF pools a chunk of input tuples, dispatches them to GPUs or an analytic engine in batch, materializes and then streams out the results. This behavior differentiates SISO UDF from all the existing ones, and makes efficient integration of analytic computation and data management feasible. We have implemented the SISO UDF framework by extending the PostgreSQL query engine, and further demonstrated the use of SISO UDF with GPU-enabled analytical query evaluation. Our experiments show that the proposed approach is scalable and efficient.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"341 1","pages":"143-151"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Extend core UDF framework for GPU-enabled analytical query evaluation\",\"authors\":\"Qiming Chen, R. Wu, M. Hsu, Bin Zhang\",\"doi\":\"10.1145/2076623.2076641\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To achieve scalable data intensive analytics, we investigate methods to integrate general purpose analytic computation into a query pipeline using User Defined Functions (UDFs). However, an existing UDF cannot act as a block operator with chunk-wise input along the tuple-wise query processing pipeline, therefore unable to deal with the application semantics definable on the set of incoming tuples representing a single object or falling in a time window, and unable to leverage external computation engines for efficient batch processing.\\n To enable the data intensive computation pipeline, we introduce a new kind of UDFs called Set-In Set-Out (SISO) UDFs. A SISO UDF is a block operator for processing the input tuples and returning the resulting tuples chunk by chunk. Operated in the query processing pipeline, a SISO UDF pools a chunk of input tuples, dispatches them to GPUs or an analytic engine in batch, materializes and then streams out the results. This behavior differentiates SISO UDF from all the existing ones, and makes efficient integration of analytic computation and data management feasible. We have implemented the SISO UDF framework by extending the PostgreSQL query engine, and further demonstrated the use of SISO UDF with GPU-enabled analytical query evaluation. Our experiments show that the proposed approach is scalable and efficient.\",\"PeriodicalId\":93615,\"journal\":{\"name\":\"Proceedings. International Database Engineering and Applications Symposium\",\"volume\":\"341 1\",\"pages\":\"143-151\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Database Engineering and Applications Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2076623.2076641\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Database Engineering and Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2076623.2076641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

为了实现可扩展的数据密集型分析,我们研究了使用用户定义函数(udf)将通用分析计算集成到查询管道中的方法。但是,现有的UDF不能作为块操作符,在元组查询处理管道上使用块输入,因此不能处理在表示单个对象或落在时间窗口内的传入元组集合上可定义的应用程序语义,也不能利用外部计算引擎进行高效的批处理。为了实现数据密集型计算管道,我们引入了一种新的udf,称为Set-In - Set-Out (SISO) udf。SISO UDF是一个块操作符,用于处理输入元组并逐块返回结果元组。在查询处理管道中操作,SISO UDF将输入元组的块池化,将它们分批分配给gpu或分析引擎,实现然后输出结果。这种行为使SISO UDF区别于所有现有的UDF,并使分析计算和数据管理的有效集成成为可能。我们通过扩展PostgreSQL查询引擎实现了SISO UDF框架,并进一步演示了在支持gpu的分析查询评估中使用SISO UDF。实验结果表明,该方法具有良好的可扩展性和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Extend core UDF framework for GPU-enabled analytical query evaluation
To achieve scalable data intensive analytics, we investigate methods to integrate general purpose analytic computation into a query pipeline using User Defined Functions (UDFs). However, an existing UDF cannot act as a block operator with chunk-wise input along the tuple-wise query processing pipeline, therefore unable to deal with the application semantics definable on the set of incoming tuples representing a single object or falling in a time window, and unable to leverage external computation engines for efficient batch processing. To enable the data intensive computation pipeline, we introduce a new kind of UDFs called Set-In Set-Out (SISO) UDFs. A SISO UDF is a block operator for processing the input tuples and returning the resulting tuples chunk by chunk. Operated in the query processing pipeline, a SISO UDF pools a chunk of input tuples, dispatches them to GPUs or an analytic engine in batch, materializes and then streams out the results. This behavior differentiates SISO UDF from all the existing ones, and makes efficient integration of analytic computation and data management feasible. We have implemented the SISO UDF framework by extending the PostgreSQL query engine, and further demonstrated the use of SISO UDF with GPU-enabled analytical query evaluation. Our experiments show that the proposed approach is scalable and efficient.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信