端到端机器学习与Apache AsterixDB

Wail Y. Alkowaileet, Sattam Alsubaiee, M. Carey, Chen Li, H. Ramampiaro, Phanwadee Sinthong, Xikui Wang
{"title":"端到端机器学习与Apache AsterixDB","authors":"Wail Y. Alkowaileet, Sattam Alsubaiee, M. Carey, Chen Li, H. Ramampiaro, Phanwadee Sinthong, Xikui Wang","doi":"10.1145/3209889.3209894","DOIUrl":null,"url":null,"abstract":"Recent developments in machine learning and data science provide a foundation for extracting underlying information from Big Data. Unfortunately, current platforms and tools often require data scientists to glue together and maintain custom-built platforms consisting of multiple Big Data component technologies. In this paper, we explain how Apache AsterixDB, an open source Big Data Management System, can help to reduce the burden involved in using machine learning algorithms in Big Data analytics. In particular, we describe how AsterixDB's built-in support for user-defined functions (UDFs), the availability of UDFs in data ingestion pipelines and queries, and the provision of machine learning platform and notebook inter-operation capabilities can together enable data analysts to more easily create and manage end-to-end analytical dataflows.","PeriodicalId":92710,"journal":{"name":"Proceedings of the Second Workshop on Data Management for End-to-End Machine Learning. Workshop on Data Management for End-to-End Machine Learning (2nd : 2018 : Houston, Tex.)","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"End-to-End Machine Learning with Apache AsterixDB\",\"authors\":\"Wail Y. Alkowaileet, Sattam Alsubaiee, M. Carey, Chen Li, H. Ramampiaro, Phanwadee Sinthong, Xikui Wang\",\"doi\":\"10.1145/3209889.3209894\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent developments in machine learning and data science provide a foundation for extracting underlying information from Big Data. Unfortunately, current platforms and tools often require data scientists to glue together and maintain custom-built platforms consisting of multiple Big Data component technologies. In this paper, we explain how Apache AsterixDB, an open source Big Data Management System, can help to reduce the burden involved in using machine learning algorithms in Big Data analytics. In particular, we describe how AsterixDB's built-in support for user-defined functions (UDFs), the availability of UDFs in data ingestion pipelines and queries, and the provision of machine learning platform and notebook inter-operation capabilities can together enable data analysts to more easily create and manage end-to-end analytical dataflows.\",\"PeriodicalId\":92710,\"journal\":{\"name\":\"Proceedings of the Second Workshop on Data Management for End-to-End Machine Learning. Workshop on Data Management for End-to-End Machine Learning (2nd : 2018 : Houston, Tex.)\",\"volume\":\"10 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Second Workshop on Data Management for End-to-End Machine Learning. Workshop on Data Management for End-to-End Machine Learning (2nd : 2018 : Houston, Tex.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3209889.3209894\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second Workshop on Data Management for End-to-End Machine Learning. Workshop on Data Management for End-to-End Machine Learning (2nd : 2018 : Houston, Tex.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3209889.3209894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

机器学习和数据科学的最新发展为从大数据中提取底层信息提供了基础。不幸的是,当前的平台和工具通常需要数据科学家粘合在一起,并维护由多种大数据组件技术组成的定制平台。在本文中,我们解释了Apache AsterixDB,一个开源的大数据管理系统,如何帮助减少在大数据分析中使用机器学习算法所带来的负担。特别是,我们描述了AsterixDB对用户定义函数(udf)的内置支持,udf在数据摄取管道和查询中的可用性,以及机器学习平台和笔记本互操作功能的提供如何使数据分析师能够更轻松地创建和管理端到端分析数据流。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
End-to-End Machine Learning with Apache AsterixDB
Recent developments in machine learning and data science provide a foundation for extracting underlying information from Big Data. Unfortunately, current platforms and tools often require data scientists to glue together and maintain custom-built platforms consisting of multiple Big Data component technologies. In this paper, we explain how Apache AsterixDB, an open source Big Data Management System, can help to reduce the burden involved in using machine learning algorithms in Big Data analytics. In particular, we describe how AsterixDB's built-in support for user-defined functions (UDFs), the availability of UDFs in data ingestion pipelines and queries, and the provision of machine learning platform and notebook inter-operation capabilities can together enable data analysts to more easily create and manage end-to-end analytical dataflows.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信