CedCom:大数据应用的高性能架构

2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA) Pub Date : 2014-11-01 DOI:10.1109/AICCSA.2014.7073257

Tanguy Raynaud, R. Haque, H. Aït-Kaci

{"title":"CedCom:大数据应用的高性能架构","authors":"Tanguy Raynaud, R. Haque, H. Aït-Kaci","doi":"10.1109/AICCSA.2014.7073257","DOIUrl":null,"url":null,"abstract":"Distributed architecture is widely used for storing and processing Big Data. Operations on Big Data need first, locating the required data blocks and then, reading them. Data can be located in different types of memories in particular, cache memory, main memory, and secondary memory. Reading data from secondary memory to process Big Data jobs is not an ideal approach especially for high performance applications because, accessing data in secondary devices can be slow for processors. In addition, fetching data from main memory is time consuming due to limited I/O bandwidth. These system level issues are barriers for optimizing performance of Big Data applications. Simply put, for optimizing the application performance, it is not sufficient to have efficient algorithms only, an efficient architecture is needed to provide faster data access by the processors. The need for such an architecture has been documented in the literature, however, the state of the art is still missing an efficient architecture. This paper develops a promising architecture which caches data in main memory. It essentially transforms a main memory into a attraction memory which enables high-speed data access. Also, it enables automatic migration of data blocks and computations across the nodes contained in the clusters. It offers an exchange protocol for fast transfer of data blocks between the different physical nodes and speeds up job processing. The proposed architecture combines the power of Cache-Only Memory Architecture (COMA) and the structural principle of Hadoop.","PeriodicalId":412749,"journal":{"name":"2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"CedCom: A high-performance architecture for Big Data applications\",\"authors\":\"Tanguy Raynaud, R. Haque, H. Aït-Kaci\",\"doi\":\"10.1109/AICCSA.2014.7073257\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed architecture is widely used for storing and processing Big Data. Operations on Big Data need first, locating the required data blocks and then, reading them. Data can be located in different types of memories in particular, cache memory, main memory, and secondary memory. Reading data from secondary memory to process Big Data jobs is not an ideal approach especially for high performance applications because, accessing data in secondary devices can be slow for processors. In addition, fetching data from main memory is time consuming due to limited I/O bandwidth. These system level issues are barriers for optimizing performance of Big Data applications. Simply put, for optimizing the application performance, it is not sufficient to have efficient algorithms only, an efficient architecture is needed to provide faster data access by the processors. The need for such an architecture has been documented in the literature, however, the state of the art is still missing an efficient architecture. This paper develops a promising architecture which caches data in main memory. It essentially transforms a main memory into a attraction memory which enables high-speed data access. Also, it enables automatic migration of data blocks and computations across the nodes contained in the clusters. It offers an exchange protocol for fast transfer of data blocks between the different physical nodes and speeds up job processing. The proposed architecture combines the power of Cache-Only Memory Architecture (COMA) and the structural principle of Hadoop.\",\"PeriodicalId\":412749,\"journal\":{\"name\":\"2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICCSA.2014.7073257\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICCSA.2014.7073257","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

分布式架构被广泛应用于大数据的存储和处理。对大数据进行操作，首先要定位需要的数据块，然后读取数据块。数据可以位于不同类型的存储器中，特别是缓存存储器、主存储器和辅助存储器。从辅助存储器读取数据来处理大数据作业并不是一种理想的方法，特别是对于高性能应用程序，因为在辅助设备中访问数据对于处理器来说可能很慢。此外，由于I/O带宽有限，从主存中提取数据非常耗时。这些系统级问题是优化大数据应用性能的障碍。简单地说，为了优化应用程序性能，仅仅拥有高效的算法是不够的，还需要一个高效的体系结构来为处理器提供更快的数据访问。对这种体系结构的需求已经在文献中得到了证明，然而，目前的技术水平仍然缺少一种高效的体系结构。本文提出了一种很有前途的在主存中缓存数据的架构。它本质上是将主存储器转换为吸引存储器，从而实现高速数据访问。此外，它还支持跨集群中包含的节点自动迁移数据块和计算。它提供了一种交换协议，用于在不同物理节点之间快速传输数据块，并加快作业处理速度。所提出的架构结合了仅缓存内存架构(COMA)的强大功能和Hadoop的结构原理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CedCom: A high-performance architecture for Big Data applications

Distributed architecture is widely used for storing and processing Big Data. Operations on Big Data need first, locating the required data blocks and then, reading them. Data can be located in different types of memories in particular, cache memory, main memory, and secondary memory. Reading data from secondary memory to process Big Data jobs is not an ideal approach especially for high performance applications because, accessing data in secondary devices can be slow for processors. In addition, fetching data from main memory is time consuming due to limited I/O bandwidth. These system level issues are barriers for optimizing performance of Big Data applications. Simply put, for optimizing the application performance, it is not sufficient to have efficient algorithms only, an efficient architecture is needed to provide faster data access by the processors. The need for such an architecture has been documented in the literature, however, the state of the art is still missing an efficient architecture. This paper develops a promising architecture which caches data in main memory. It essentially transforms a main memory into a attraction memory which enables high-speed data access. Also, it enables automatic migration of data blocks and computations across the nodes contained in the clusters. It offers an exchange protocol for fast transfer of data blocks between the different physical nodes and speeds up job processing. The proposed architecture combines the power of Cache-Only Memory Architecture (COMA) and the structural principle of Hadoop.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA)

自引率

0.00%

发文量