走向极限计算与大数据中心的融合

Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing Pub Date : 2016-06-01 DOI:10.1145/2912152.2912159

S. Matsuoka

{"title":"走向极限计算与大数据中心的融合","authors":"S. Matsuoka","doi":"10.1145/2912152.2912159","DOIUrl":null,"url":null,"abstract":"Rapid growth in the use cases and demands for extreme computing and huge data processing is leading to convergence of the two infrastructures. Tokyo Tech.'s TSUBAME3.0, a 2017 addition to the highly successful TSUBAME2.5, will aim to deploy a series of innovative technologies, including ultra-efficient liquid cooling and power control, petabytes of non-volatile memory, as well as low cost Petabit-class interconnect. To address the challenges of such technology adoption, proper system architecture, software stack, and algorithm must be desgined and developed; these are being addressed by several of our ongoing research projects as well as prototypes, such as the TSUBAME-KFC/DL prototype which became #1 in the world in power efficiency on the Green500 twice in a row, the Billion-way Resiliency project that is investigating effective methods for future resilient supercomputers, as well as the Extreme Big Data (EBD) project which is looking at co-design development of convergent system stack given future extreme data and computing workloads. We are already successful in developing various algorithms and sottware substrates to manipulate big data elements directly on extreme supercomputers, such as graphs, tables (sort), trees, files, etc. and in fact became #1 in the world on the Graph 500 twice including the latest Nov. 2015 version. Our recent focus is also how to ssupport new workloads in categorizing big data represented by deep learning, and there we are collaborating with several partners such as DENSO to improve the scalability and predictability of such workloads; recent trial allowed scalablity to utilize 1146 GPUs for the entire week for a CNN workload. For TSUBAME3 and 2.5 combined we espect to increase such capabilities to over 80 Petaflops in early 2017, or 7 times faster than the K computer.","PeriodicalId":443897,"journal":{"name":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Convergence of Extreme Computing and Big Data Centers\",\"authors\":\"S. Matsuoka\",\"doi\":\"10.1145/2912152.2912159\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Rapid growth in the use cases and demands for extreme computing and huge data processing is leading to convergence of the two infrastructures. Tokyo Tech.'s TSUBAME3.0, a 2017 addition to the highly successful TSUBAME2.5, will aim to deploy a series of innovative technologies, including ultra-efficient liquid cooling and power control, petabytes of non-volatile memory, as well as low cost Petabit-class interconnect. To address the challenges of such technology adoption, proper system architecture, software stack, and algorithm must be desgined and developed; these are being addressed by several of our ongoing research projects as well as prototypes, such as the TSUBAME-KFC/DL prototype which became #1 in the world in power efficiency on the Green500 twice in a row, the Billion-way Resiliency project that is investigating effective methods for future resilient supercomputers, as well as the Extreme Big Data (EBD) project which is looking at co-design development of convergent system stack given future extreme data and computing workloads. We are already successful in developing various algorithms and sottware substrates to manipulate big data elements directly on extreme supercomputers, such as graphs, tables (sort), trees, files, etc. and in fact became #1 in the world on the Graph 500 twice including the latest Nov. 2015 version. Our recent focus is also how to ssupport new workloads in categorizing big data represented by deep learning, and there we are collaborating with several partners such as DENSO to improve the scalability and predictability of such workloads; recent trial allowed scalablity to utilize 1146 GPUs for the entire week for a CNN workload. For TSUBAME3 and 2.5 combined we espect to increase such capabilities to over 80 Petaflops in early 2017, or 7 times faster than the K computer.\",\"PeriodicalId\":443897,\"journal\":{\"name\":\"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2912152.2912159\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2912152.2912159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

用例的快速增长以及对极限计算和海量数据处理的需求正在导致两种基础架构的融合。东京工业大学的TSUBAME3.0是在大获成功的TSUBAME2.5的基础上于2017年推出的新产品，旨在部署一系列创新技术，包括超高效液体冷却和电源控制、pb级非易失性存储器以及低成本的pb级互连。为了应对这种技术采用的挑战，必须设计和开发适当的系统架构、软件堆栈和算法;我们正在进行的几个研究项目和原型正在解决这些问题，例如TSUBAME-KFC/DL原型在Green500中连续两次成为世界第一的能效，正在研究未来弹性超级计算机有效方法的十亿路弹性项目，以及考虑到未来极端数据和计算工作负载的融合系统堆栈协同设计开发的极限大数据(EBD)项目。我们已经成功地开发了各种算法和软件基板，可以直接在超级计算机上操作大数据元素，如图、表(排序)、树、文件等，事实上，包括2015年11月的最新版本在内，我们已经两次在Graph 500上成为世界第一。我们最近关注的焦点是如何支持对深度学习代表的大数据进行分类的新工作负载，我们正在与DENSO等几个合作伙伴合作，以提高这些工作负载的可扩展性和可预测性;最近的试验允许在整个星期内使用1146个gpu来处理CNN的工作负载。对于TSUBAME3和2.5，我们希望在2017年初将这种能力提高到每秒80千万亿次以上，比K计算机快7倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Convergence of Extreme Computing and Big Data Centers

Rapid growth in the use cases and demands for extreme computing and huge data processing is leading to convergence of the two infrastructures. Tokyo Tech.'s TSUBAME3.0, a 2017 addition to the highly successful TSUBAME2.5, will aim to deploy a series of innovative technologies, including ultra-efficient liquid cooling and power control, petabytes of non-volatile memory, as well as low cost Petabit-class interconnect. To address the challenges of such technology adoption, proper system architecture, software stack, and algorithm must be desgined and developed; these are being addressed by several of our ongoing research projects as well as prototypes, such as the TSUBAME-KFC/DL prototype which became #1 in the world in power efficiency on the Green500 twice in a row, the Billion-way Resiliency project that is investigating effective methods for future resilient supercomputers, as well as the Extreme Big Data (EBD) project which is looking at co-design development of convergent system stack given future extreme data and computing workloads. We are already successful in developing various algorithms and sottware substrates to manipulate big data elements directly on extreme supercomputers, such as graphs, tables (sort), trees, files, etc. and in fact became #1 in the world on the Graph 500 twice including the latest Nov. 2015 version. Our recent focus is also how to ssupport new workloads in categorizing big data represented by deep learning, and there we are collaborating with several partners such as DENSO to improve the scalability and predictability of such workloads; recent trial allowed scalablity to utilize 1146 GPUs for the entire week for a CNN workload. For TSUBAME3 and 2.5 combined we espect to increase such capabilities to over 80 Petaflops in early 2017, or 7 times faster than the K computer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing

自引率

0.00%

发文量