Usability in machine learning at scale with graphlab

Proceedings of the 22nd ACM international conference on Information & Knowledge Management Pub Date : 2013-10-27 DOI:10.1145/2505515.2527108

Carlos Guestrin

{"title":"Usability in machine learning at scale with graphlab","authors":"Carlos Guestrin","doi":"10.1145/2505515.2527108","DOIUrl":null,"url":null,"abstract":"Today, machine learning (ML) methods play a central role in industry and science. The growth of the Web and improvements in sensor data collection technology have been rapidly increasing the magnitude and complexity of the ML tasks we must solve. This growth is driving the need for scalable, parallel ML algorithms that can handle \"Big Data.\" In this talk, we will focus on: Examining common algorithmic patterns in distributed ML methods. Qualifying the challenges of implementing these algorithms in real distributed systems. Describing computational frameworks for implementing these algorithms at scale. Addressing a significant core challenge to large-scale ML -- enabling the widespread adoption of machine learning beyond experts. In the latter part, we will focus mainly on the GraphLab framework, which naturally expresses asynchronous, dynamic graph computations that are key for state-of-the-art ML algorithms. When these algorithms are expressed in our higher-level abstraction, GraphLab will effectively address many of the underlying parallelism challenges, including data distribution, optimized communication, and guaranteeing sequential consistency, a property that is surprisingly important for many ML algorithms. On a variety of large-scale tasks, GraphLab provides 20-100x performance improvements over Hadoop. In recent months, GraphLab has received many tens of thousands of downloads, and is being actively used by a number of startups, companies, research labs and universities.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2505515.2527108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Today, machine learning (ML) methods play a central role in industry and science. The growth of the Web and improvements in sensor data collection technology have been rapidly increasing the magnitude and complexity of the ML tasks we must solve. This growth is driving the need for scalable, parallel ML algorithms that can handle "Big Data." In this talk, we will focus on: Examining common algorithmic patterns in distributed ML methods. Qualifying the challenges of implementing these algorithms in real distributed systems. Describing computational frameworks for implementing these algorithms at scale. Addressing a significant core challenge to large-scale ML -- enabling the widespread adoption of machine learning beyond experts. In the latter part, we will focus mainly on the GraphLab framework, which naturally expresses asynchronous, dynamic graph computations that are key for state-of-the-art ML algorithms. When these algorithms are expressed in our higher-level abstraction, GraphLab will effectively address many of the underlying parallelism challenges, including data distribution, optimized communication, and guaranteeing sequential consistency, a property that is surprisingly important for many ML algorithms. On a variety of large-scale tasks, GraphLab provides 20-100x performance improvements over Hadoop. In recent months, GraphLab has received many tens of thousands of downloads, and is being actively used by a number of startups, companies, research labs and universities.

查看原文本刊更多论文

graphlab在大规模机器学习中的可用性

今天，机器学习(ML)方法在工业和科学中发挥着核心作用。网络的发展和传感器数据收集技术的改进已经迅速增加了我们必须解决的机器学习任务的规模和复杂性。这种增长推动了对可扩展的并行ML算法的需求，这些算法可以处理“大数据”。在这次演讲中，我们将重点关注:检查分布式机器学习方法中的常见算法模式。确定在真实的分布式系统中实现这些算法的挑战。描述大规模实现这些算法的计算框架。解决大规模机器学习的重大核心挑战——使机器学习在专家之外得到广泛采用。在后面的部分，我们将主要关注GraphLab框架，它自然地表达异步、动态图计算，这是最先进的ML算法的关键。当这些算法在我们的高级抽象中表达时，GraphLab将有效地解决许多潜在的并行性挑战，包括数据分布、优化的通信和保证顺序一致性，这是一个对许多ML算法非常重要的属性。在各种大规模任务上，GraphLab提供了比Hadoop 20-100倍的性能提升。最近几个月，GraphLab已经获得了数以万计的下载量，并被许多初创公司、公司、研究实验室和大学积极使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 22nd ACM international conference on Information & Knowledge Management

自引率

0.00%

发文量