Time Series Machine Learning Models for Precise SSD Access Latency Prediction

IF 1.4 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Computer Architecture Letters Pub Date : 2025-06-20 DOI:10.1109/LCA.2025.3581580

Bikrant Das Sharma;Houxiang Ji;Ipoom Jeong;Nam Sung Kim

{"title":"Time Series Machine Learning Models for Precise SSD Access Latency Prediction","authors":"Bikrant Das Sharma;Houxiang Ji;Ipoom Jeong;Nam Sung Kim","doi":"10.1109/LCA.2025.3581580","DOIUrl":null,"url":null,"abstract":"Solid State Drives (SSDs) have become the dominant storage solution over the past few years. A key component of SSDs is the controller, which manages communication between the host and flash memory, optimizing data transfer speeds, integrity, and lifespan. However, modern SSDs function as closed boxes, as manufacturers do not disclose firmware and controller details. Meanwhile, read and write latencies are affected by various internal optimizations, such as wear-leveling and garbage collection, making precise latency prediction challenging. Existing approaches rely on trace-driven simulation or machine learning, but either (1) just classify operations into broad latency categories (e.g., fast or slow), including software stack overhead, or (2) make imprecise predictions while consuming significant system resources and time. For system simulation, latency predictions must be both fast and accurate, focusing solely on device-level delays excluding OS overhead, which is modeled separately. To tackle these challenges, this paper presents time series machine learning models to accurately predict hardware-only SSD latencies across diverse workloads. Our evaluation shows that the proposed model predicts 85%–95% of individual I/O latencies within a 10% error margin, outperforming existing simulators and ML models, which achieve only 6%–37% accuracy, while also providing 4×–255× speedups in prediction latency.","PeriodicalId":51248,"journal":{"name":"IEEE Computer Architecture Letters","volume":"24 2","pages":"233-236"},"PeriodicalIF":1.4000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Computer Architecture Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11045247/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Solid State Drives (SSDs) have become the dominant storage solution over the past few years. A key component of SSDs is the controller, which manages communication between the host and flash memory, optimizing data transfer speeds, integrity, and lifespan. However, modern SSDs function as closed boxes, as manufacturers do not disclose firmware and controller details. Meanwhile, read and write latencies are affected by various internal optimizations, such as wear-leveling and garbage collection, making precise latency prediction challenging. Existing approaches rely on trace-driven simulation or machine learning, but either (1) just classify operations into broad latency categories (e.g., fast or slow), including software stack overhead, or (2) make imprecise predictions while consuming significant system resources and time. For system simulation, latency predictions must be both fast and accurate, focusing solely on device-level delays excluding OS overhead, which is modeled separately. To tackle these challenges, this paper presents time series machine learning models to accurately predict hardware-only SSD latencies across diverse workloads. Our evaluation shows that the proposed model predicts 85%–95% of individual I/O latencies within a 10% error margin, outperforming existing simulators and ML models, which achieve only 6%–37% accuracy, while also providing 4×–255× speedups in prediction latency.

查看原文本刊更多论文

用于SSD访问延迟精确预测的时间序列机器学习模型

在过去几年中，固态硬盘（ssd）已成为主要的存储解决方案。ssd的一个关键组件是控制器，它管理主机和闪存之间的通信，优化数据传输速度、完整性和使用寿命。然而，现代ssd的功能是封闭的盒子，因为制造商不披露固件和控制器的细节。同时，读写延迟受到各种内部优化的影响，例如损耗均衡和垃圾收集，这使得精确的延迟预测具有挑战性。现有的方法依赖于跟踪驱动的模拟或机器学习，但要么(1)只是将操作分类为广泛的延迟类别（例如，快速或缓慢），包括软件堆栈开销，要么(2)在消耗大量系统资源和时间的同时做出不精确的预测。对于系统模拟，延迟预测必须既快速又准确，只关注设备级延迟，不包括操作系统开销，这是单独建模的。为了应对这些挑战，本文提出了时间序列机器学习模型，以准确预测不同工作负载下的纯硬件SSD延迟。我们的评估表明，提出的模型在10%的误差范围内预测85%-95%的单个I/O延迟，优于现有的模拟器和ML模型，后者的准确率仅为6%-37%，同时还提供4×-255×预测延迟的速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Computer Architecture Letters COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

CiteScore

4.60

自引率

4.30%

发文量

期刊介绍： IEEE Computer Architecture Letters is a rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessor computer systems, computer architecture, microarchitecture, workload characterization, performance evaluation and simulation techniques, and power-aware computing. Submissions are welcomed on any topic in computer architecture, especially but not limited to: microprocessor and multiprocessor systems, microarchitecture and ILP processors, workload characterization, performance evaluation and simulation techniques, compiler-hardware and operating system-hardware interactions, interconnect architectures, memory and cache systems, power and thermal issues at the architecture level, I/O architectures and techniques, independent validation of previously published results, analysis of unsuccessful techniques, domain-specific processor architectures (e.g., embedded, graphics, network, etc.), real-time and high-availability architectures, reconfigurable systems.