Usability in machine learning at scale with graphlab

Carlos Guestrin
{"title":"Usability in machine learning at scale with graphlab","authors":"Carlos Guestrin","doi":"10.1145/2505515.2527108","DOIUrl":null,"url":null,"abstract":"Today, machine learning (ML) methods play a central role in industry and science. The growth of the Web and improvements in sensor data collection technology have been rapidly increasing the magnitude and complexity of the ML tasks we must solve. This growth is driving the need for scalable, parallel ML algorithms that can handle \"Big Data.\" In this talk, we will focus on: Examining common algorithmic patterns in distributed ML methods. Qualifying the challenges of implementing these algorithms in real distributed systems. Describing computational frameworks for implementing these algorithms at scale. Addressing a significant core challenge to large-scale ML -- enabling the widespread adoption of machine learning beyond experts. In the latter part, we will focus mainly on the GraphLab framework, which naturally expresses asynchronous, dynamic graph computations that are key for state-of-the-art ML algorithms. When these algorithms are expressed in our higher-level abstraction, GraphLab will effectively address many of the underlying parallelism challenges, including data distribution, optimized communication, and guaranteeing sequential consistency, a property that is surprisingly important for many ML algorithms. On a variety of large-scale tasks, GraphLab provides 20-100x performance improvements over Hadoop. In recent months, GraphLab has received many tens of thousands of downloads, and is being actively used by a number of startups, companies, research labs and universities.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2505515.2527108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Today, machine learning (ML) methods play a central role in industry and science. The growth of the Web and improvements in sensor data collection technology have been rapidly increasing the magnitude and complexity of the ML tasks we must solve. This growth is driving the need for scalable, parallel ML algorithms that can handle "Big Data." In this talk, we will focus on: Examining common algorithmic patterns in distributed ML methods. Qualifying the challenges of implementing these algorithms in real distributed systems. Describing computational frameworks for implementing these algorithms at scale. Addressing a significant core challenge to large-scale ML -- enabling the widespread adoption of machine learning beyond experts. In the latter part, we will focus mainly on the GraphLab framework, which naturally expresses asynchronous, dynamic graph computations that are key for state-of-the-art ML algorithms. When these algorithms are expressed in our higher-level abstraction, GraphLab will effectively address many of the underlying parallelism challenges, including data distribution, optimized communication, and guaranteeing sequential consistency, a property that is surprisingly important for many ML algorithms. On a variety of large-scale tasks, GraphLab provides 20-100x performance improvements over Hadoop. In recent months, GraphLab has received many tens of thousands of downloads, and is being actively used by a number of startups, companies, research labs and universities.
graphlab在大规模机器学习中的可用性
今天,机器学习(ML)方法在工业和科学中发挥着核心作用。网络的发展和传感器数据收集技术的改进已经迅速增加了我们必须解决的机器学习任务的规模和复杂性。这种增长推动了对可扩展的并行ML算法的需求,这些算法可以处理“大数据”。在这次演讲中,我们将重点关注:检查分布式机器学习方法中的常见算法模式。确定在真实的分布式系统中实现这些算法的挑战。描述大规模实现这些算法的计算框架。解决大规模机器学习的重大核心挑战——使机器学习在专家之外得到广泛采用。在后面的部分,我们将主要关注GraphLab框架,它自然地表达异步、动态图计算,这是最先进的ML算法的关键。当这些算法在我们的高级抽象中表达时,GraphLab将有效地解决许多潜在的并行性挑战,包括数据分布、优化的通信和保证顺序一致性,这是一个对许多ML算法非常重要的属性。在各种大规模任务上,GraphLab提供了比Hadoop 20-100倍的性能提升。最近几个月,GraphLab已经获得了数以万计的下载量,并被许多初创公司、公司、研究实验室和大学积极使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信