X-DINC: Toward Cross-Layer ApproXimation for the Distributed and In-Network ACceleration of Multi-Kernel Applications

IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS
Zahra Ebrahimi , Maryam Eslami , Xun Xiao , Akash Kumar
{"title":"X-DINC: Toward Cross-Layer ApproXimation for the Distributed and In-Network ACceleration of Multi-Kernel Applications","authors":"Zahra Ebrahimi ,&nbsp;Maryam Eslami ,&nbsp;Xun Xiao ,&nbsp;Akash Kumar","doi":"10.1016/j.future.2025.107864","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid evolution of programmable network devices and the urge for energy-efficient and sustainable computing, network infrastructures are mutating toward a computing pipeline, providing In-Network Computing (INC) capability. Despite the initial success in offloading single/small kernels to the network devices, deploying multi-kernel applications remains challenging due to limited memory, computing resources, and lack of support for Floating Point (FP) and complex operations. To tackle these challenges, we present a cross-layer approximation and distribution methodology (X-DINC) that exploits the error resilience of applications. X-DINC utilizes a chain of techniques to facilitate kernel deployment and distribution across heterogeneous devices in INC environments. First, we identify approximation and optimization opportunities in data acquisition and computation phases of multi-kernel applications. Second, we simplify complex arithmetic operations to cope with the <em>computation</em> limitations of the programmable network switches. Third, we perform application-level sensitivity analysis to measure the trade-off between performance gain and Quality of Results (QoR) loss when approximating individual kernels via various techniques. Finally, a greedy heuristic swiftly generates Pareto/near-Pareto mixed-precision configurations that maximize the performance gain while maintaining the user-defined QoR. X-DINC is prototyped on a Virtex-7 Field Programmable Gate Array (FPGA) and evaluated using the Blind Source Separation (BSS) application on industrial audio dataset. Results show that X-DINC performs separation up to 35% faster with up to 88% lower Area-Delay Product (ADP) compared to an <em>Accurate-Centralized</em> approach, when distributed across 2 to 7 network nodes, while maintaining audio quality within an acceptable range of 15–20 dB.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"172 ","pages":"Article 107864"},"PeriodicalIF":6.2000,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25001591","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

With the rapid evolution of programmable network devices and the urge for energy-efficient and sustainable computing, network infrastructures are mutating toward a computing pipeline, providing In-Network Computing (INC) capability. Despite the initial success in offloading single/small kernels to the network devices, deploying multi-kernel applications remains challenging due to limited memory, computing resources, and lack of support for Floating Point (FP) and complex operations. To tackle these challenges, we present a cross-layer approximation and distribution methodology (X-DINC) that exploits the error resilience of applications. X-DINC utilizes a chain of techniques to facilitate kernel deployment and distribution across heterogeneous devices in INC environments. First, we identify approximation and optimization opportunities in data acquisition and computation phases of multi-kernel applications. Second, we simplify complex arithmetic operations to cope with the computation limitations of the programmable network switches. Third, we perform application-level sensitivity analysis to measure the trade-off between performance gain and Quality of Results (QoR) loss when approximating individual kernels via various techniques. Finally, a greedy heuristic swiftly generates Pareto/near-Pareto mixed-precision configurations that maximize the performance gain while maintaining the user-defined QoR. X-DINC is prototyped on a Virtex-7 Field Programmable Gate Array (FPGA) and evaluated using the Blind Source Separation (BSS) application on industrial audio dataset. Results show that X-DINC performs separation up to 35% faster with up to 88% lower Area-Delay Product (ADP) compared to an Accurate-Centralized approach, when distributed across 2 to 7 network nodes, while maintaining audio quality within an acceptable range of 15–20 dB.
X-DINC:面向分布式和网络内多核应用加速的跨层逼近
随着可编程网络设备的快速发展以及对节能和可持续计算的需求,网络基础设施正在向计算管道转变,提供网络内计算(INC)能力。尽管在将单个/小型内核卸载到网络设备方面取得了初步成功,但由于内存、计算资源有限,以及缺乏对浮点数(FP)和复杂操作的支持,部署多内核应用程序仍然具有挑战性。为了应对这些挑战,我们提出了一种利用应用程序的错误弹性的跨层近似和分布方法(X-DINC)。X-DINC利用一系列技术来促进在INC环境中跨异构设备的内核部署和分发。首先,我们确定了多核应用在数据采集和计算阶段的近似和优化机会。其次,我们简化了复杂的算术运算,以应对可编程网络交换机的计算限制。第三,我们执行应用级灵敏度分析,以衡量通过各种技术近似单个核时性能增益和结果质量(QoR)损失之间的权衡。最后,贪婪启发式算法快速生成Pareto/近Pareto混合精度配置,在保持用户自定义QoR的同时最大化性能增益。X-DINC在Virtex-7现场可编程门阵列(FPGA)上进行原型设计,并使用盲源分离(BSS)在工业音频数据集上的应用进行评估。结果表明,当分布在2到7个网络节点上时,与精确集中方法相比,X-DINC的分离速度提高了35%,区域延迟产品(ADP)降低了88%,同时将音频质量保持在15-20 dB的可接受范围内。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
19.90
自引率
2.70%
发文量
376
审稿时长
10.6 months
期刊介绍: Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信