High-Bandwidth Low-Latency Approximate Interconnection Networks

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2017-05-05 DOI:10.1109/HPCA.2017.38

Daichi Fujiki, K. Ishii, I. Fujiwara, Hiroki Matsutani, H. Amano, H. Casanova, M. Koibuchi

{"title":"High-Bandwidth Low-Latency Approximate Interconnection Networks","authors":"Daichi Fujiki, K. Ishii, I. Fujiwara, Hiroki Matsutani, H. Amano, H. Casanova, M. Koibuchi","doi":"10.1109/HPCA.2017.38","DOIUrl":null,"url":null,"abstract":"Computational applications are subject to various kinds of numerical errors, ranging from deterministic round-off errors to soft errors caused by non-deterministic bit flips, which do not lead to application failure but corrupt application results. Non-deterministic bit flips are typically mitigated in hardware using various error correcting codes (ECC). But in practice, due to performance and cost concerns, these techniques do not guarantee error-free execution. On large-scale computing platforms, soft errors occur with non-negligible probability in RAM and on the CPU, and it has become clear that applications must tolerate them. For some applications, this tolerance is intrinsic as result quality can remain acceptable even in the presence of soft errors (e.g., data analysis applications, multimedia applications). Tolerance can also be built into the application, resolving data corruptions in software during application execution. By contrast, today's optical networks hold on to a rigid error-free standard, which imposes limits on network performance scalability. In this work we propose high-bandwidth, low-latency approximate networks with the following three features:(1) Optical links that exploit multi-level quadrature amplitude modulation (QAM) for achieving high bandwidth, (2) Avoidance of forward error correction (FEC), which makes optical link error-prone but affords lower latency, and(3) The use of symbol mapping coding between bit sequence and QAM to ensure data integrity that is sufficient for practical soft-error-tolerant applications. Discrete-event simulation results for application benchmarks show that approx networks achieve speedups up to 2.94 when compared to conventional networks.","PeriodicalId":118950,"journal":{"name":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2017.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

Computational applications are subject to various kinds of numerical errors, ranging from deterministic round-off errors to soft errors caused by non-deterministic bit flips, which do not lead to application failure but corrupt application results. Non-deterministic bit flips are typically mitigated in hardware using various error correcting codes (ECC). But in practice, due to performance and cost concerns, these techniques do not guarantee error-free execution. On large-scale computing platforms, soft errors occur with non-negligible probability in RAM and on the CPU, and it has become clear that applications must tolerate them. For some applications, this tolerance is intrinsic as result quality can remain acceptable even in the presence of soft errors (e.g., data analysis applications, multimedia applications). Tolerance can also be built into the application, resolving data corruptions in software during application execution. By contrast, today's optical networks hold on to a rigid error-free standard, which imposes limits on network performance scalability. In this work we propose high-bandwidth, low-latency approximate networks with the following three features:(1) Optical links that exploit multi-level quadrature amplitude modulation (QAM) for achieving high bandwidth, (2) Avoidance of forward error correction (FEC), which makes optical link error-prone but affords lower latency, and(3) The use of symbol mapping coding between bit sequence and QAM to ensure data integrity that is sufficient for practical soft-error-tolerant applications. Discrete-event simulation results for application benchmarks show that approx networks achieve speedups up to 2.94 when compared to conventional networks.

查看原文本刊更多论文

高带宽低延迟近似互连网络

计算应用程序会受到各种数值误差的影响，从确定性舍入误差到非确定性位翻转引起的软误差，这些错误不会导致应用程序失败，但会破坏应用程序结果。不确定性位翻转通常在硬件中使用各种纠错码(ECC)来缓解。但在实践中，由于性能和成本方面的考虑，这些技术并不能保证无错误的执行。在大规模计算平台上，软错误在RAM和CPU中以不可忽略的概率发生，很明显，应用程序必须容忍它们。对于某些应用程序，这种容忍度是固有的，因为即使存在软错误(例如，数据分析应用程序、多媒体应用程序)，结果质量也可以保持可接受。还可以在应用程序中内置容忍度，在应用程序执行期间解决软件中的数据损坏问题。相比之下，今天的光网络坚持严格的无差错标准，这限制了网络性能的可扩展性。在这项工作中，我们提出了具有以下三个特征的高带宽，低延迟近似网络:(1)利用多级正交调幅(QAM)实现高带宽的光链路，(2)避免前向纠错(FEC)，这使得光链路容易出错，但提供更低的延迟，以及(3)使用位序列和QAM之间的符号映射编码来确保数据完整性，这足以用于实际的软容错应用。应用程序基准的离散事件仿真结果表明，与传统网络相比，近似网络的加速高达2.94。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量