LLM系统中内存相关软件老化的实验研究

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software Pub Date : 2025-10-10 DOI:10.1016/j.jss.2025.112653

César Santos , Fumio Machida , Ermeson Andrade

{"title":"LLM系统中内存相关软件老化的实验研究","authors":"César Santos , Fumio Machida , Ermeson Andrade","doi":"10.1016/j.jss.2025.112653","DOIUrl":null,"url":null,"abstract":"<div><div>Large Language Models (LLMs) have been increasingly adopted in a wide range of applications, many of which require long-running inference processes. However, these systems may be subject to software aging phenomena, leading to progressive performance degradation and potential failures. In this work, we experimentally investigate memory-related software aging in LLM inference. We performed 48-hour experiments with three open-source models (Pythia, OPT, and GPT-Neo) under low, medium, and high workloads, monitoring memory consumption at both system and process levels. Using the Mann–Kendall test and Sen’s slope estimator, we observed monotonic growth in RAM usage across all models on Central Processing Units (CPUs), with OPT presenting the steepest slopes. Process-level analysis further revealed that LLM processes were the primary contributors to memory growth, along with background services. Additionally, we conducted identical experiments on Graphics Processing Units (GPUs). Unlike the experiments without a GPU, GPU-based experiments revealed bounded oscillations and abrupt resets likely due to driver-level memory management, while host RAM and process-level monitoring still revealed clear symptoms of aging. These findings demonstrate that software aging manifests differently across execution environments, reinforcing the need for environment-specific monitoring approaches.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"231 ","pages":"Article 112653"},"PeriodicalIF":4.1000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Experimental investigation of memory-related software aging in LLM systems\",\"authors\":\"César Santos , Fumio Machida , Ermeson Andrade\",\"doi\":\"10.1016/j.jss.2025.112653\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Large Language Models (LLMs) have been increasingly adopted in a wide range of applications, many of which require long-running inference processes. However, these systems may be subject to software aging phenomena, leading to progressive performance degradation and potential failures. In this work, we experimentally investigate memory-related software aging in LLM inference. We performed 48-hour experiments with three open-source models (Pythia, OPT, and GPT-Neo) under low, medium, and high workloads, monitoring memory consumption at both system and process levels. Using the Mann–Kendall test and Sen’s slope estimator, we observed monotonic growth in RAM usage across all models on Central Processing Units (CPUs), with OPT presenting the steepest slopes. Process-level analysis further revealed that LLM processes were the primary contributors to memory growth, along with background services. Additionally, we conducted identical experiments on Graphics Processing Units (GPUs). Unlike the experiments without a GPU, GPU-based experiments revealed bounded oscillations and abrupt resets likely due to driver-level memory management, while host RAM and process-level monitoring still revealed clear symptoms of aging. These findings demonstrate that software aging manifests differently across execution environments, reinforcing the need for environment-specific monitoring approaches.</div></div>\",\"PeriodicalId\":51099,\"journal\":{\"name\":\"Journal of Systems and Software\",\"volume\":\"231 \",\"pages\":\"Article 112653\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems and Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S016412122500322X\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016412122500322X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（llm）已经越来越多地被广泛的应用程序所采用，其中许多应用程序需要长时间运行的推理过程。然而，这些系统可能会受到软件老化现象的影响，导致性能逐渐下降和潜在的故障。在这项工作中，我们实验研究了LLM推理中与记忆相关的软件老化。我们在低、中、高工作负载下使用三个开源模型（Pythia、OPT和GPT-Neo）进行了48小时的实验，监控系统和进程级别的内存消耗。使用Mann-Kendall测试和Sen斜率估计器，我们观察到所有型号的中央处理器（cpu）的RAM使用量呈单调增长，其中OPT呈现最陡的斜率。进程级分析进一步表明，LLM进程和后台服务是内存增长的主要贡献者。此外，我们在图形处理单元（gpu）上进行了相同的实验。与没有GPU的实验不同，基于GPU的实验显示出有限的振荡和突然的重置可能是由于驱动级内存管理，而主机RAM和进程级监控仍然显示出明显的老化症状。这些发现表明，软件老化在不同的执行环境中表现不同，这加强了对特定于环境的监视方法的需求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Experimental investigation of memory-related software aging in LLM systems

Large Language Models (LLMs) have been increasingly adopted in a wide range of applications, many of which require long-running inference processes. However, these systems may be subject to software aging phenomena, leading to progressive performance degradation and potential failures. In this work, we experimentally investigate memory-related software aging in LLM inference. We performed 48-hour experiments with three open-source models (Pythia, OPT, and GPT-Neo) under low, medium, and high workloads, monitoring memory consumption at both system and process levels. Using the Mann–Kendall test and Sen’s slope estimator, we observed monotonic growth in RAM usage across all models on Central Processing Units (CPUs), with OPT presenting the steepest slopes. Process-level analysis further revealed that LLM processes were the primary contributors to memory growth, along with background services. Additionally, we conducted identical experiments on Graphics Processing Units (GPUs). Unlike the experiments without a GPU, GPU-based experiments revealed bounded oscillations and abrupt resets likely due to driver-level memory management, while host RAM and process-level monitoring still revealed clear symptoms of aging. These findings demonstrate that software aging manifests differently across execution environments, reinforcing the need for environment-specific monitoring approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.