Ease the Process of Machine Learning with Dataflow

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management Pub Date : 2016-10-24 DOI:10.1145/2983323.2983327

Tianyou Guo, Jun Xu, Xiaohui Yan, Jianpeng Hou, Ping Li, Zhaohui Li, J. Guo, Xueqi Cheng

{"title":"Ease the Process of Machine Learning with Dataflow","authors":"Tianyou Guo, Jun Xu, Xiaohui Yan, Jianpeng Hou, Ping Li, Zhaohui Li, J. Guo, Xueqi Cheng","doi":"10.1145/2983323.2983327","DOIUrl":null,"url":null,"abstract":"Machine learning algorithms have become the key components in many big data applications. However, the full potential of machine learning is still far from been realized because using machine learning algorithms is hard, especially on distributed platforms such as Hadoop and Spark. The key barriers come from not only the implementation of the algorithms themselves, but also the processing for applying them to real applications which often involve multiple steps and different algorithms. In this demo we present a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks. In the system, a learning task is formulated as a directed acyclic graph (DAG) in which each node represents an operation (e.g., a machine learning algorithm), and each edge represents the flow of the data from one node to its descendants. Graphical user interface is implemented for making users to create, configure, submit, and monitor a task in a drag-and-drop manner. Advantages of the system include 1) lowering the barriers of defining and executing machine learning tasks; 2) sharing and re-using the implementations of the algorithms, the task dataflow DAGs, and the (intermediate) experimental results; 3) seamlessly integrating the stand-alone algorithms as well as the distributed algorithms in one task. The system has been deployed as a machine learning service and can be access from the Internet.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2983323.2983327","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Machine learning algorithms have become the key components in many big data applications. However, the full potential of machine learning is still far from been realized because using machine learning algorithms is hard, especially on distributed platforms such as Hadoop and Spark. The key barriers come from not only the implementation of the algorithms themselves, but also the processing for applying them to real applications which often involve multiple steps and different algorithms. In this demo we present a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks. In the system, a learning task is formulated as a directed acyclic graph (DAG) in which each node represents an operation (e.g., a machine learning algorithm), and each edge represents the flow of the data from one node to its descendants. Graphical user interface is implemented for making users to create, configure, submit, and monitor a task in a drag-and-drop manner. Advantages of the system include 1) lowering the barriers of defining and executing machine learning tasks; 2) sharing and re-using the implementations of the algorithms, the task dataflow DAGs, and the (intermediate) experimental results; 3) seamlessly integrating the stand-alone algorithms as well as the distributed algorithms in one task. The system has been deployed as a machine learning service and can be access from the Internet.

查看原文本刊更多论文

用数据流简化机器学习的过程

机器学习算法已经成为许多大数据应用的关键组成部分。然而，机器学习的全部潜力仍然远未实现，因为使用机器学习算法很难，特别是在Hadoop和Spark等分布式平台上。关键的障碍不仅来自算法本身的实现，还来自将它们应用于实际应用的处理，这通常涉及多个步骤和不同的算法。在这个演示中，我们展示了一个通用的基于数据流的系统，用于简化将机器学习算法应用于现实世界任务的过程。在系统中，一个学习任务被表述为一个有向无环图(DAG)，其中每个节点代表一个操作(例如，一个机器学习算法)，每个边代表数据从一个节点流向它的后代。实现了图形用户界面，使用户可以通过拖放方式创建、配置、提交和监视任务。该系统的优点包括:1)降低了定义和执行机器学习任务的障碍;2)算法实现、任务数据流dag和(中间)实验结果的共享和重用;3)将独立算法和分布式算法无缝集成到一个任务中。该系统已被部署为机器学习服务，可以从互联网访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

自引率

0.00%

发文量