善用“大数据”平台:训练kinect

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI:10.1145/2287076.2287078

M. Budiu

{"title":"善用“大数据”平台:训练kinect","authors":"M. Budiu","doi":"10.1145/2287076.2287078","DOIUrl":null,"url":null,"abstract":"In the last 7 years at Microsoft Research in Silicon Valley we have constructed the DryadLINQ software stack for large-scale data-parallel cluster computations. The architecture of the ensemble is depicted in Figure 1. The goal of the DryadLINQ project is to make writing parallel programs manipulating large amounts of data (terabytes to petabytes) as easy as programming a single machine. DryadLINQ is a batch computation model, optimized for throughput; since it is targets large clusters of commodity computers faulttolerance is a primary concern. A primary tenet is that moving computation close to the data is much cheaper than moving the data itself. Here we discuss briefly the current architecture of the system (but more research is ongoing). Our software runs on relatively inexpensive computer clusters, using unmodified Windows Server. Our software makes minimal assumptions about the underlying cluster, and has","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Putting a \\\"big-data\\\" platform to good use: training kinect\",\"authors\":\"M. Budiu\",\"doi\":\"10.1145/2287076.2287078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the last 7 years at Microsoft Research in Silicon Valley we have constructed the DryadLINQ software stack for large-scale data-parallel cluster computations. The architecture of the ensemble is depicted in Figure 1. The goal of the DryadLINQ project is to make writing parallel programs manipulating large amounts of data (terabytes to petabytes) as easy as programming a single machine. DryadLINQ is a batch computation model, optimized for throughput; since it is targets large clusters of commodity computers faulttolerance is a primary concern. A primary tenet is that moving computation close to the data is much cheaper than moving the data itself. Here we discuss briefly the current architecture of the system (but more research is ongoing). Our software runs on relatively inexpensive computer clusters, using unmodified Windows Server. Our software makes minimal assumptions about the underlying cluster, and has\",\"PeriodicalId\":330072,\"journal\":{\"name\":\"IEEE International Symposium on High-Performance Parallel Distributed Computing\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Symposium on High-Performance Parallel Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2287076.2287078\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on High-Performance Parallel Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2287076.2287078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在过去的7年里，我们在硅谷的微软研究院构建了用于大规模数据并行集群计算的DryadLINQ软件堆栈。集成的体系结构如图1所示。DryadLINQ项目的目标是编写处理大量数据(tb到pb)的并行程序，就像编写单个机器一样简单。DryadLINQ是一个批处理计算模型，针对吞吐量进行了优化;由于它的目标是大型商用计算机集群，因此容错是一个主要问题。一个基本原则是，将计算移动到数据附近比移动数据本身要便宜得多。在这里，我们简要地讨论了系统的当前架构(但更多的研究正在进行中)。我们的软件运行在相对便宜的计算机集群上，使用未经修改的Windows Server。我们的软件对底层集群做了最小的假设，并且做到了

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Putting a "big-data" platform to good use: training kinect

In the last 7 years at Microsoft Research in Silicon Valley we have constructed the DryadLINQ software stack for large-scale data-parallel cluster computations. The architecture of the ensemble is depicted in Figure 1. The goal of the DryadLINQ project is to make writing parallel programs manipulating large amounts of data (terabytes to petabytes) as easy as programming a single machine. DryadLINQ is a batch computation model, optimized for throughput; since it is targets large clusters of commodity computers faulttolerance is a primary concern. A primary tenet is that moving computation close to the data is much cheaper than moving the data itself. Here we discuss briefly the current architecture of the system (but more research is ongoing). Our software runs on relatively inexpensive computer clusters, using unmodified Windows Server. Our software makes minimal assumptions about the underlying cluster, and has

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE International Symposium on High-Performance Parallel Distributed Computing

自引率

0.00%

发文量