Big data processing with 1D-Crosspoint Arrays

IF 0.6 Q4 COMPUTER SCIENCE, THEORY & METHODS
Taeyoung An, A. Oruç
{"title":"Big data processing with 1D-Crosspoint Arrays","authors":"Taeyoung An, A. Oruç","doi":"10.1080/17445760.2023.2172574","DOIUrl":null,"url":null,"abstract":"Increased chip densities offer massive computation power to deal with fundamental big data operations such as searching and sorting. At the same time, the proliferation of processing elements (PEs) in such multicore chips together with the employment of more aggressive parallel algorithms cause the amount of space needed for interprocessor communications to dominate the overall chip space, potentially resulting in reduced computational efficiency. To overcome this issue, this paper introduces a new architecture that uses simple crosspoint switches to pair PEs instead of a complex interconnection network. This new architecture may be viewed as a ‘quadratic’ array of processors as it uses PEs rather than PEs as in linear array processor models. The switches between adjacent PEs are created using a cyclic permutation wiring idea with PEs and as many crosspoints. We demonstrate the versatility of this new parallel architecture by designing fast algorithms to sort and search a list of n elements with it. In particular, we show that finding a minimum, maximum, and searching a list of n elements can all be performed on this parallel architecture in time with additional elementary logic gates with fan-in and in time with fan-in. We further show that sorting a list of n elements can also be carried out in time using additional elementary logic gates with fan-in and threshold logic gates on the same parallel architecture. The sorting time increases to if only elementary logic gates with fan-in are used. In addition, we establish how similar queries can be handled within the same order of time complexities. We use this new parallel architecture to perform sorting and searching on big data on three different models. The first of these models provides an efficient implementation of enumeration sorting and searching for moderate size big data sets. The second model offers increased parallelism by replication of the new parallel architecture but its hardware complexity limits its use to moderate size big data sets as well. The third model removes this limitation by introducing a tradeoff parameter between the time and hardware complexity of the overall computation, thereby providing an optimal use of available resources within a given chip-set space.","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":"38 1","pages":"249 - 274"},"PeriodicalIF":0.6000,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Parallel Emergent and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/17445760.2023.2172574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Increased chip densities offer massive computation power to deal with fundamental big data operations such as searching and sorting. At the same time, the proliferation of processing elements (PEs) in such multicore chips together with the employment of more aggressive parallel algorithms cause the amount of space needed for interprocessor communications to dominate the overall chip space, potentially resulting in reduced computational efficiency. To overcome this issue, this paper introduces a new architecture that uses simple crosspoint switches to pair PEs instead of a complex interconnection network. This new architecture may be viewed as a ‘quadratic’ array of processors as it uses PEs rather than PEs as in linear array processor models. The switches between adjacent PEs are created using a cyclic permutation wiring idea with PEs and as many crosspoints. We demonstrate the versatility of this new parallel architecture by designing fast algorithms to sort and search a list of n elements with it. In particular, we show that finding a minimum, maximum, and searching a list of n elements can all be performed on this parallel architecture in time with additional elementary logic gates with fan-in and in time with fan-in. We further show that sorting a list of n elements can also be carried out in time using additional elementary logic gates with fan-in and threshold logic gates on the same parallel architecture. The sorting time increases to if only elementary logic gates with fan-in are used. In addition, we establish how similar queries can be handled within the same order of time complexities. We use this new parallel architecture to perform sorting and searching on big data on three different models. The first of these models provides an efficient implementation of enumeration sorting and searching for moderate size big data sets. The second model offers increased parallelism by replication of the new parallel architecture but its hardware complexity limits its use to moderate size big data sets as well. The third model removes this limitation by introducing a tradeoff parameter between the time and hardware complexity of the overall computation, thereby providing an optimal use of available resources within a given chip-set space.
1D交叉点阵列的大数据处理
增加的芯片密度提供了巨大的计算能力来处理基本的大数据操作,如搜索和排序。同时,这种多核芯片中处理元素(pe)的激增以及更激进的并行算法的使用导致处理器间通信所需的空间量占据了整个芯片空间,从而可能导致计算效率降低。为了克服这个问题,本文引入了一种新的架构,使用简单的交叉点交换机对pe进行配对,而不是复杂的互连网络。这种新架构可以被视为一个“二次”处理器阵列,因为它使用pe而不是线性阵列处理器模型中的pe。相邻pe之间的开关使用pe和尽可能多的交叉点的循环排列布线思想创建。我们通过设计快速算法来对包含n个元素的列表进行排序和搜索,从而展示了这种新的并行架构的多功能性。特别地,我们证明了查找最小值、最大值和搜索n个元素的列表都可以在这个并行架构上及时执行,使用额外的基本逻辑门(带扇入)和及时执行扇入。我们进一步表明,在相同的并行架构上,使用带有扇入和阈值逻辑门的附加基本逻辑门也可以及时地对n个元素的列表进行排序。如果只使用带扇入的初级逻辑门,则排序时间会增加。此外,我们还确定了如何在相同的时间复杂度内处理类似的查询。我们使用这种新的并行架构在三种不同的模型上对大数据进行排序和搜索。第一个模型为中等规模的大数据集提供了枚举排序和搜索的有效实现。第二种模型通过复制新的并行架构提供了更高的并行性,但其硬件复杂性也限制了其用于中等规模的大数据集。第三种模型通过在整体计算的时间和硬件复杂性之间引入权衡参数来消除这一限制,从而在给定的芯片组空间内提供对可用资源的最佳利用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.30
自引率
0.00%
发文量
27
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信