{"title":"Optimizing the performance of in-memory file system by thread scheduling and file migration under NUMA multiprocessor systems","authors":"Ting Wu , Jingting He , Ying Qian , Weichen Liu","doi":"10.1016/j.sysarc.2025.103344","DOIUrl":null,"url":null,"abstract":"<div><div>Internet and IoT Applications generate large amounts of data that require efficient storage and processing. Emerging Compute Express Link (CXL) and Non-Volatile Memories (NVM) bring new opportunities for in-memory computing by reducing the latency of data access and processing. Many in-memory file systems based on the Hybrid DRAM/NVM are designed for high performance. However, achieving high performance under Non-Uniform Memory Access (NUMA) multiprocessor systems has significant challenges. In particular, the performance of file requests on NUMA systems varies over a disturbingly wide range, depending on the affinity of threads to file data. Moreover, memory controllers and interconnect links congestion bring excessive latency and performance loss on file accesses. Therefore, both the placement of file and thread and load balance are critical for data-intensive applications on NUMA systems. In this paper, we optimize the performance of multiple threads requesting in-memory files on NUMA systems by considering both memory congestion and data locality. First, we present the system model and formulate the problem as latency minimization on NUMA nodes. Then, we present a two-layer design to optimize the performance by properly migrating threads and dynamically adjusting the file distribution. Further, based on the design, we implement a functional NUMA-aware in-memory file system, Hydrafs-RFCT, in the Linux kernel. Experimental results show that the Hydrafs-RFCT optimizes the performance of multi-thread applications on NUMA systems. The average aggravated performance of Hydrafs-RFCT is 100.14 %, 112.7 %, 39.4 %, and 6.4 % higher than that of Ext4-DAX, PMFS, SIMFS, and Hydrafs, respectively.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"159 ","pages":"Article 103344"},"PeriodicalIF":3.7000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125000165","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Internet and IoT Applications generate large amounts of data that require efficient storage and processing. Emerging Compute Express Link (CXL) and Non-Volatile Memories (NVM) bring new opportunities for in-memory computing by reducing the latency of data access and processing. Many in-memory file systems based on the Hybrid DRAM/NVM are designed for high performance. However, achieving high performance under Non-Uniform Memory Access (NUMA) multiprocessor systems has significant challenges. In particular, the performance of file requests on NUMA systems varies over a disturbingly wide range, depending on the affinity of threads to file data. Moreover, memory controllers and interconnect links congestion bring excessive latency and performance loss on file accesses. Therefore, both the placement of file and thread and load balance are critical for data-intensive applications on NUMA systems. In this paper, we optimize the performance of multiple threads requesting in-memory files on NUMA systems by considering both memory congestion and data locality. First, we present the system model and formulate the problem as latency minimization on NUMA nodes. Then, we present a two-layer design to optimize the performance by properly migrating threads and dynamically adjusting the file distribution. Further, based on the design, we implement a functional NUMA-aware in-memory file system, Hydrafs-RFCT, in the Linux kernel. Experimental results show that the Hydrafs-RFCT optimizes the performance of multi-thread applications on NUMA systems. The average aggravated performance of Hydrafs-RFCT is 100.14 %, 112.7 %, 39.4 %, and 6.4 % higher than that of Ext4-DAX, PMFS, SIMFS, and Hydrafs, respectively.
期刊介绍:
The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software.
Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.