大规模数据分析的并行广义Hebbian算法

Mesopotamian Journal of Big Data Pub Date : 2021-03-22 DOI:10.58496/mjbd/2021/003

M. Yaseen, Mohammad Naeemullah, Ibarhim Adeb Mansoor

{"title":"大规模数据分析的并行广义Hebbian算法","authors":"M. Yaseen, Mohammad Naeemullah, Ibarhim Adeb Mansoor","doi":"10.58496/mjbd/2021/003","DOIUrl":null,"url":null,"abstract":"In order to store and analyse large amounts of data on a parallel cluster, Big Data Systems such as Hadoop and DBMSs require a complex configuration and tuning procedure. This is mostly the result of static partitioning occurring whenever data sets are imported into the file system or transferred into it. Following that, parallel processing is carried out in a distributed fashion, with the objective of achieving balanced parallel execution among nodes. The system is notoriously difficult to configure, particularly in the areas of node synchronisation, data redistribution, and distributed caching in main memory. The extended Hebbian algorithm, abbreviated as GHA, is a linear feedforward neural network model for unsupervised learning that finds the majority of its applications in principle components analysis. Sanger's rule is another name for the GHA that may be found in the academic literature. Its formulation and stability, with the additional feature that it may be used to networks that have more than one output. A unique hardware architecture for principal component analysis is presented here in the form of a paper. The Generalized Hebbian Algorithm (GHA) was chosen as the foundation for the design because to the fact that it is both straightforward and efficient. The architecture may be broken down into three distinct parts: the memory unit, the weight vector updating unit, and the primary computing unit. Within the weight vector updating unit, the computation of various synaptic weight vectors uses the same circuit in order to cut down on the area expenses. This is done in order to save space. The GHA architecture incorporates a versatile multi-computer framework that is based on mpi. Therefore, GHA may be efficiently executed on platforms that utilise either sequential processing or parallel processing. When the data set is studied for a short period of time or when a dynamic number of virtual processors is selected at runtime, we predict that our architecture will be able to profit from parallel processing on the cloud. In this research, a parallel implementation of a variety of machine learning algorithms that are built on top of the MapReduce paradigm is presented with the purpose of improving processing speed and saving time.","PeriodicalId":325612,"journal":{"name":"Mesopotamian Journal of Big Data","volume":"119 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Parallel Generalized Hebbian Algorithm for Large Scale Data Analytics\",\"authors\":\"M. Yaseen, Mohammad Naeemullah, Ibarhim Adeb Mansoor\",\"doi\":\"10.58496/mjbd/2021/003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to store and analyse large amounts of data on a parallel cluster, Big Data Systems such as Hadoop and DBMSs require a complex configuration and tuning procedure. This is mostly the result of static partitioning occurring whenever data sets are imported into the file system or transferred into it. Following that, parallel processing is carried out in a distributed fashion, with the objective of achieving balanced parallel execution among nodes. The system is notoriously difficult to configure, particularly in the areas of node synchronisation, data redistribution, and distributed caching in main memory. The extended Hebbian algorithm, abbreviated as GHA, is a linear feedforward neural network model for unsupervised learning that finds the majority of its applications in principle components analysis. Sanger's rule is another name for the GHA that may be found in the academic literature. Its formulation and stability, with the additional feature that it may be used to networks that have more than one output. A unique hardware architecture for principal component analysis is presented here in the form of a paper. The Generalized Hebbian Algorithm (GHA) was chosen as the foundation for the design because to the fact that it is both straightforward and efficient. The architecture may be broken down into three distinct parts: the memory unit, the weight vector updating unit, and the primary computing unit. Within the weight vector updating unit, the computation of various synaptic weight vectors uses the same circuit in order to cut down on the area expenses. This is done in order to save space. The GHA architecture incorporates a versatile multi-computer framework that is based on mpi. Therefore, GHA may be efficiently executed on platforms that utilise either sequential processing or parallel processing. When the data set is studied for a short period of time or when a dynamic number of virtual processors is selected at runtime, we predict that our architecture will be able to profit from parallel processing on the cloud. In this research, a parallel implementation of a variety of machine learning algorithms that are built on top of the MapReduce paradigm is presented with the purpose of improving processing speed and saving time.\",\"PeriodicalId\":325612,\"journal\":{\"name\":\"Mesopotamian Journal of Big Data\",\"volume\":\"119 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-03-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mesopotamian Journal of Big Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.58496/mjbd/2021/003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mesopotamian Journal of Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.58496/mjbd/2021/003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

为了在并行集群上存储和分析大量数据，Hadoop和dbms等大数据系统需要复杂的配置和调优过程。这主要是在将数据集导入或传输到文件系统时发生静态分区的结果。然后，以分布式方式执行并行处理，目标是在节点之间实现均衡的并行执行。众所周知，该系统很难配置，特别是在节点同步、数据重新分配和主存中的分布式缓存方面。扩展的Hebbian算法，简称GHA，是一种用于无监督学习的线性前馈神经网络模型，其主要应用于主成分分析。桑格规则是GHA的另一个名称，可以在学术文献中找到。它的配方和稳定性，附加的特点是它可以用于有多个输出的网络。本文以论文的形式提出了一种独特的主成分分析硬件体系结构。选择广义赫比算法(GHA)作为设计的基础，因为它既简单又高效。该体系结构可以分为三个不同的部分:内存单元、权重向量更新单元和主计算单元。在权向量更新单元内，各种突触权向量的计算使用同一电路，以减少面积开销。这样做是为了节省空间。GHA体系结构结合了一个基于mpi的多功能多计算机框架。因此，GHA可以在使用顺序处理或并行处理的平台上有效地执行。当对数据集进行短时间研究或在运行时选择动态数量的虚拟处理器时，我们预测我们的架构将能够从云上的并行处理中获益。在本研究中，提出了基于MapReduce范式构建的各种机器学习算法的并行实现，目的是提高处理速度和节省时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Parallel Generalized Hebbian Algorithm for Large Scale Data Analytics

In order to store and analyse large amounts of data on a parallel cluster, Big Data Systems such as Hadoop and DBMSs require a complex configuration and tuning procedure. This is mostly the result of static partitioning occurring whenever data sets are imported into the file system or transferred into it. Following that, parallel processing is carried out in a distributed fashion, with the objective of achieving balanced parallel execution among nodes. The system is notoriously difficult to configure, particularly in the areas of node synchronisation, data redistribution, and distributed caching in main memory. The extended Hebbian algorithm, abbreviated as GHA, is a linear feedforward neural network model for unsupervised learning that finds the majority of its applications in principle components analysis. Sanger's rule is another name for the GHA that may be found in the academic literature. Its formulation and stability, with the additional feature that it may be used to networks that have more than one output. A unique hardware architecture for principal component analysis is presented here in the form of a paper. The Generalized Hebbian Algorithm (GHA) was chosen as the foundation for the design because to the fact that it is both straightforward and efficient. The architecture may be broken down into three distinct parts: the memory unit, the weight vector updating unit, and the primary computing unit. Within the weight vector updating unit, the computation of various synaptic weight vectors uses the same circuit in order to cut down on the area expenses. This is done in order to save space. The GHA architecture incorporates a versatile multi-computer framework that is based on mpi. Therefore, GHA may be efficiently executed on platforms that utilise either sequential processing or parallel processing. When the data set is studied for a short period of time or when a dynamic number of virtual processors is selected at runtime, we predict that our architecture will be able to profit from parallel processing on the cloud. In this research, a parallel implementation of a variety of machine learning algorithms that are built on top of the MapReduce paradigm is presented with the purpose of improving processing speed and saving time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Mesopotamian Journal of Big Data

自引率

0.00%

发文量