Pluggable AI-based real-time stragglers detection framework in Hadoop

IF 3 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

High-Confidence Computing Pub Date : 2026-03-01 Epub Date: 2025-07-03 DOI:10.1016/j.hcc.2025.100341

Xinyuan Liu, Yinhao Li, Rajiv Ranjan, Devki Nandan Jha

{"title":"Pluggable AI-based real-time stragglers detection framework in Hadoop","authors":"Xinyuan Liu, Yinhao Li, Rajiv Ranjan, Devki Nandan Jha","doi":"10.1016/j.hcc.2025.100341","DOIUrl":null,"url":null,"abstract":"<div><div>The growing reliance on big data frameworks such as Hadoop has revolutionized data processing across various domains, enabling large-scale storage and distributed computation. Hadoop is widely employed in real-world applications such as high-performance computation tasks, e-commerce and data analysis in healthcare. However, the efficiency of Hadoop systems is often hampered by faults and anomalies, with stragglers emerging as one of the most prevalent issues. Stragglers disrupt workflows, waste resources and degrade system performance. While existing anomaly detection models employ methods like median analysis or static thresholds, they often struggle with issues such as high false positives, lack of adaptability and poor handling of complex heterogeneous environments. To address these challenges, this paper presents <span>Plabs</span>, a flexible stragglers detection framework for Hadoop. The framework comprises two core components: (1) a Monitoring Module providing real-time tracking of cluster resources and task progress and (2) a Pluggable AI-based straggler detection module, designed for precise straggler task identification. By leveraging advanced monitoring and AI-driven analysis, <span>Plabs</span> offers an automated, flexible and scalable solution for detecting stragglers at run-time in Hadoop clusters. We evaluated <span>Plabs</span> exhaustively with three Machine Learning (ML), two Deep Learning (DL) and two Large Language Models (LLMs) on five different applications in a real testbed environment. Our experiment evaluation shows that DL models outperform others in identifying Hadoop stragglers, achieving superior accuracy and reliability for all the applications.</div></div>","PeriodicalId":100605,"journal":{"name":"High-Confidence Computing","volume":"6 1","pages":"Article 100341"},"PeriodicalIF":3.0000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"High-Confidence Computing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667295225000455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/3 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The growing reliance on big data frameworks such as Hadoop has revolutionized data processing across various domains, enabling large-scale storage and distributed computation. Hadoop is widely employed in real-world applications such as high-performance computation tasks, e-commerce and data analysis in healthcare. However, the efficiency of Hadoop systems is often hampered by faults and anomalies, with stragglers emerging as one of the most prevalent issues. Stragglers disrupt workflows, waste resources and degrade system performance. While existing anomaly detection models employ methods like median analysis or static thresholds, they often struggle with issues such as high false positives, lack of adaptability and poor handling of complex heterogeneous environments. To address these challenges, this paper presents Plabs, a flexible stragglers detection framework for Hadoop. The framework comprises two core components: (1) a Monitoring Module providing real-time tracking of cluster resources and task progress and (2) a Pluggable AI-based straggler detection module, designed for precise straggler task identification. By leveraging advanced monitoring and AI-driven analysis, Plabs offers an automated, flexible and scalable solution for detecting stragglers at run-time in Hadoop clusters. We evaluated Plabs exhaustively with three Machine Learning (ML), two Deep Learning (DL) and two Large Language Models (LLMs) on five different applications in a real testbed environment. Our experiment evaluation shows that DL models outperform others in identifying Hadoop stragglers, achieving superior accuracy and reliability for all the applications.

查看原文本刊更多论文

Hadoop中可插入的基于人工智能的实时掉队检测框架

对Hadoop等大数据框架的日益依赖已经彻底改变了跨各个领域的数据处理，使大规模存储和分布式计算成为可能。Hadoop被广泛应用于现实世界的应用程序中，如高性能计算任务、电子商务和医疗保健领域的数据分析。然而，Hadoop系统的效率经常受到故障和异常的阻碍，掉队者成为最普遍的问题之一。掉队者扰乱工作流程，浪费资源，降低系统性能。虽然现有的异常检测模型采用了中位数分析或静态阈值等方法，但它们经常会遇到误报率高、适应性不足以及对复杂异构环境处理能力差等问题。为了应对这些挑战，本文提出了Plabs，一个灵活的Hadoop掉队检测框架。该框架包括两个核心组件：(1)监控模块，提供集群资源和任务进度的实时跟踪；(2)基于Pluggable ai的掉队者检测模块，用于精确识别掉队者任务。通过利用先进的监控和人工智能驱动的分析，Plabs提供了一个自动化、灵活和可扩展的解决方案，用于在Hadoop集群的运行时检测掉队者。我们在一个真实的测试平台环境中，用三个机器学习（ML），两个深度学习（DL）和两个大型语言模型（llm）在五个不同的应用程序上对Plabs进行了详尽的评估。我们的实验评估表明，深度学习模型在识别Hadoop掉队者方面优于其他模型，为所有应用程序实现了卓越的准确性和可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

High-Confidence Computing

CiteScore

4.70

自引率

0.00%

发文量