Putting a "big-data" platform to good use: training kinect

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI:10.1145/2287076.2287078

M. Budiu

引用次数: 2

Abstract

In the last 7 years at Microsoft Research in Silicon Valley we have constructed the DryadLINQ software stack for large-scale data-parallel cluster computations. The architecture of the ensemble is depicted in Figure 1. The goal of the DryadLINQ project is to make writing parallel programs manipulating large amounts of data (terabytes to petabytes) as easy as programming a single machine. DryadLINQ is a batch computation model, optimized for throughput; since it is targets large clusters of commodity computers faulttolerance is a primary concern. A primary tenet is that moving computation close to the data is much cheaper than moving the data itself. Here we discuss briefly the current architecture of the system (but more research is ongoing). Our software runs on relatively inexpensive computer clusters, using unmodified Windows Server. Our software makes minimal assumptions about the underlying cluster, and has

查看原文本刊更多论文

善用“大数据”平台:训练kinect

在过去的7年里，我们在硅谷的微软研究院构建了用于大规模数据并行集群计算的DryadLINQ软件堆栈。集成的体系结构如图1所示。DryadLINQ项目的目标是编写处理大量数据(tb到pb)的并行程序，就像编写单个机器一样简单。DryadLINQ是一个批处理计算模型，针对吞吐量进行了优化;由于它的目标是大型商用计算机集群，因此容错是一个主要问题。一个基本原则是，将计算移动到数据附近比移动数据本身要便宜得多。在这里，我们简要地讨论了系统的当前架构(但更多的研究正在进行中)。我们的软件运行在相对便宜的计算机集群上，使用未经修改的Windows Server。我们的软件对底层集群做了最小的假设，并且做到了

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE International Symposium on High-Performance Parallel Distributed Computing

自引率

0.00%

发文量