A systematic fault-tolerant computational model for both crash failures and silent data corruption

2018 21st Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN) Pub Date : 2018-02-01 DOI:10.1109/ICIN.2018.8401596

Xiaolong Cui, Zaeem Hussain, T. Znati, R. Melhem

引用次数: 1

Abstract

As the boundaries between Cloud and HPC continue to blur, it is clear that there is an urgent demand for a systematic computational model that adapts to the computing platform and accommodates the underlying workloads. As computing systems continue to scale out to satisfy the increasingly large demands on computing capacity, power awareness and fault tolerance have become major concerns. This paper proposes a novel computational model that applies to both compute- and data-intensive workloads, and deals with diverse types of faults. Evaluation results demonstrate that the proposed model is able to achieve significant energy savings compared to existing fault tolerance techniques, while maintaining the same level of fault tolerance.

查看原文本刊更多论文

一个系统的容错计算模型，用于崩溃故障和静默数据损坏

随着云和高性能计算之间的界限不断模糊，很明显，迫切需要一种适应计算平台并容纳底层工作负载的系统计算模型。随着计算系统不断向外扩展以满足对计算能力日益增长的需求，功率感知和容错已成为主要问题。本文提出了一种新的计算模型，适用于计算密集型和数据密集型工作负载，并处理各种类型的故障。评估结果表明，与现有的容错技术相比，该模型能够在保持相同容错水平的情况下实现显著的节能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 21st Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN)

自引率

0.00%

发文量