Databricks上大规模软件即服务的经验教训

M. Zaharia
{"title":"Databricks上大规模软件即服务的经验教训","authors":"M. Zaharia","doi":"10.1145/3357223.3365870","DOIUrl":null,"url":null,"abstract":"The cloud has become one of the most attractive ways for enterprises to purchase software, but it requires building products in a very different way from traditional software, which has not been heavily studied in research. I will explain some of these challenges based on my experience at Databricks, a startup that provides a data analytics platform as a service on AWS and Azure. Databricks manages millions of VMs per day to run data engineering and machine learning workloads using Apache Spark, TensorFlow, Python and other software for thousands of customers. Two main challenges arise in this context: (1) building a reliable, scalable control plane that can manage thousands of customers at once and (2) adapting the data processing software itself (e.g. Apache Spark) for an elastic cloud environment (for instance, autoscaling instead of assuming static clusters). These challenges are especially significant for data analytics workloads whose users constantly push boundaries in terms of scale (e.g. number of VMs used, data size, metadata size, number of concurrent users, etc). I'll describe some of the common challenges that our new services face and some of the main ways that Databricks has extended and modified open source analytics software for the cloud environment (e.g., designing an autoscaling engine for Apache Spark and creating a transactional storage layer on top of S3 in the Delta Lake open source product).","PeriodicalId":91949,"journal":{"name":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Lessons from Large-Scale Software as a Service at Databricks\",\"authors\":\"M. Zaharia\",\"doi\":\"10.1145/3357223.3365870\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The cloud has become one of the most attractive ways for enterprises to purchase software, but it requires building products in a very different way from traditional software, which has not been heavily studied in research. I will explain some of these challenges based on my experience at Databricks, a startup that provides a data analytics platform as a service on AWS and Azure. Databricks manages millions of VMs per day to run data engineering and machine learning workloads using Apache Spark, TensorFlow, Python and other software for thousands of customers. Two main challenges arise in this context: (1) building a reliable, scalable control plane that can manage thousands of customers at once and (2) adapting the data processing software itself (e.g. Apache Spark) for an elastic cloud environment (for instance, autoscaling instead of assuming static clusters). These challenges are especially significant for data analytics workloads whose users constantly push boundaries in terms of scale (e.g. number of VMs used, data size, metadata size, number of concurrent users, etc). I'll describe some of the common challenges that our new services face and some of the main ways that Databricks has extended and modified open source analytics software for the cloud environment (e.g., designing an autoscaling engine for Apache Spark and creating a transactional storage layer on top of S3 in the Delta Lake open source product).\",\"PeriodicalId\":91949,\"journal\":{\"name\":\"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)\",\"volume\":\"25 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3357223.3365870\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM Symposium on Cloud Computing [electronic resource] : SOCC ... ... SoCC (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3357223.3365870","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

云计算已经成为企业购买软件的最具吸引力的方式之一,但它需要以一种与传统软件截然不同的方式构建产品,这方面的研究还没有得到深入研究。我将根据我在Databricks的经历来解释其中的一些挑战。Databricks是一家在AWS和Azure上提供数据分析平台服务的初创公司。Databricks每天管理数百万台虚拟机,使用Apache Spark、TensorFlow、Python和其他软件为数千名客户运行数据工程和机器学习工作负载。在这种情况下出现了两个主要挑战:(1)构建一个可靠的、可扩展的控制平面,可以同时管理数千个客户;(2)调整数据处理软件本身(例如Apache Spark)以适应弹性云环境(例如,自动扩展而不是假设静态集群)。这些挑战对于数据分析工作负载来说尤其重要,因为这些工作负载的用户在规模方面不断突破界限(例如使用的虚拟机数量、数据大小、元数据大小、并发用户数量等)。我将描述我们的新服务面临的一些常见挑战,以及Databricks为云环境扩展和修改开源分析软件的一些主要方式(例如,为Apache Spark设计一个自动缩放引擎,并在Delta Lake开源产品的S3之上创建一个事务性存储层)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Lessons from Large-Scale Software as a Service at Databricks
The cloud has become one of the most attractive ways for enterprises to purchase software, but it requires building products in a very different way from traditional software, which has not been heavily studied in research. I will explain some of these challenges based on my experience at Databricks, a startup that provides a data analytics platform as a service on AWS and Azure. Databricks manages millions of VMs per day to run data engineering and machine learning workloads using Apache Spark, TensorFlow, Python and other software for thousands of customers. Two main challenges arise in this context: (1) building a reliable, scalable control plane that can manage thousands of customers at once and (2) adapting the data processing software itself (e.g. Apache Spark) for an elastic cloud environment (for instance, autoscaling instead of assuming static clusters). These challenges are especially significant for data analytics workloads whose users constantly push boundaries in terms of scale (e.g. number of VMs used, data size, metadata size, number of concurrent users, etc). I'll describe some of the common challenges that our new services face and some of the main ways that Databricks has extended and modified open source analytics software for the cloud environment (e.g., designing an autoscaling engine for Apache Spark and creating a transactional storage layer on top of S3 in the Delta Lake open source product).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信