升级高性能计算环境，处理海量数据

IF 2.4 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Internet Services and Applications Pub Date : 2019-10-16 DOI:10.1186/s13174-019-0118-7

Lucas M. Ponce, Walter dos Santos, Wagner Meira, Dorgival Guedes, Daniele Lezzi, Rosa M. Badia

{"title":"升级高性能计算环境，处理海量数据","authors":"Lucas M. Ponce, Walter dos Santos, Wagner Meira, Dorgival Guedes, Daniele Lezzi, Rosa M. Badia","doi":"10.1186/s13174-019-0118-7","DOIUrl":null,"url":null,"abstract":"High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs’s use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction.","PeriodicalId":46467,"journal":{"name":"Journal of Internet Services and Applications","volume":"176 2 1","pages":"1-18"},"PeriodicalIF":2.4000,"publicationDate":"2019-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Upgrading a high performance computing environment for massive data processing\",\"authors\":\"Lucas M. Ponce, Walter dos Santos, Wagner Meira, Dorgival Guedes, Daniele Lezzi, Rosa M. Badia\",\"doi\":\"10.1186/s13174-019-0118-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs’s use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction.\",\"PeriodicalId\":46467,\"journal\":{\"name\":\"Journal of Internet Services and Applications\",\"volume\":\"176 2 1\",\"pages\":\"1-18\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2019-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Internet Services and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s13174-019-0118-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Internet Services and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s13174-019-0118-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 7

摘要

高性能计算(HPC)和海量数据处理(Big data)是两个开始融合的趋势。在这个过程中，硬件架构、系统支持和编程范例的各个方面正在从两个角度重新审视。本文介绍了我们在这条收敛路径上的经验，并提出了一个框架，该框架解决了源自这种集成的一些编程问题。我们的贡献是开发一个集成环境，它集成了(1)COMPSs，一种用于开发和执行分布式基础设施并行应用程序的编程框架;柠檬水，数据挖掘和分析工具;(iii) HDFS，大数据系统中使用最广泛的分布式文件系统。为了验证我们的框架，我们使用Lemonade创建了通过HDFS访问数据的COMPSs应用程序，并将它们与使用流行的大数据框架Spark构建的等效应用程序进行了比较。结果表明，通过简化数据访问和重新安排数据传输，减少执行时间，HDFS集成使comps受益。与Lemonade的集成促进了COMPSs的使用，并可能有助于它在数据科学社区的普及，因为它为数据领域的专家提供了高效的算法实现，这些专家希望开发具有更高抽象层次的应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Upgrading a high performance computing environment for massive data processing

High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs’s use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Internet Services and Applications COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

3.70

自引率

0.00%

发文量

审稿时长

13 weeks