Jiaxing Qi;Zhongzhi Luan;Shaohan Huang;Carol Fung;Hailong Yang
{"title":"LogSay: An Efficient Comprehension System for Log Numerical Reasoning","authors":"Jiaxing Qi;Zhongzhi Luan;Shaohan Huang;Carol Fung;Hailong Yang","doi":"10.1109/TC.2024.3386068","DOIUrl":"10.1109/TC.2024.3386068","url":null,"abstract":"With the growth of smart systems and applications, high volume logs are generated that record important data for system maintenance. System developers are usually required to analyze logs to track the status of the system or applications. Therefore, it is essential to find the answers in large-scale logs when they have some questions. In this work, we design a multi-step \u0000<italic>“Retriever-Reader”</i>\u0000 question-answering system, namely LogSay, which aims at predicting answers accurately and efficiently. Our system can not only answers simple questions, such as a segment log or span, but also can answer complex logical questions through numerical reasoning. LogSay has two key components: \u0000<italic>Log Retriever</i>\u0000 and \u0000<italic>Log Reasoner</i>\u0000, and we designed five operators to implement them. \u0000<italic>Log Retriever</i>\u0000 aims at retrieving some relevant logs based on a question. Then, \u0000<italic>Log Reasoner</i>\u0000 performs numerical reasoning to infer the final answer. In addition, due to the lack of available question-answering datasets for system logs, we constructed question-answering datasets based on three public log datasets and will make them publicly available. Our evaluation results show that LogSay outperforms the state-of-the-art works in terms of accuracy and efficiency.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1809-1821"},"PeriodicalIF":3.7,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hang Huang;Honglei Wang;Jia Rao;Song Wu;Hao Fan;Chen Yu;Hai Jin;Kun Suo;Lisong Pan
{"title":"vKernel: Enhancing Container Isolation via Private Code and Data","authors":"Hang Huang;Honglei Wang;Jia Rao;Song Wu;Hao Fan;Chen Yu;Hai Jin;Kun Suo;Lisong Pan","doi":"10.1109/TC.2024.3383988","DOIUrl":"10.1109/TC.2024.3383988","url":null,"abstract":"Container technology is increasingly adopted in cloud environments. However, the lack of isolation in the shared kernel becomes a significant barrier to the wide adoption of containers. The challenges lie in how to simultaneously attain high performance and isolation. On the one hand, kernel-level isolation mechanisms, such as \u0000<italic>seccomp</i>\u0000, \u0000<italic>capabilities</i>\u0000, and \u0000<italic>apparmor</i>\u0000, achieve good performance without much overhead, but lack the support for per-container customization. On the other hand, user-level and VM-based isolation offer superior security guarantees and allow for customization, since a container is assigned a dedicated kernel, but at the cost of high overhead. We present vKernel, a kernel isolation framework. It maintains a minimal set of code and data that are either sensitive or prone to interference in a \u0000<italic>vKernel Instance</i>\u0000 (vKI). vKernel relies on inline hooks to intercept and redirect requests sent to the host kernel to a vKI, where container-specific security rules, functions, and data are implemented. Through case studies, we demonstrate that under vKernel user-defined data isolation and kernel customization can be supported with a reasonable engineering effort. An evaluation of vKernel with micro-benchmarks, cloud services, real-world applications show that vKernel achieves good security guarantees, but with much less overhead.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1711-1723"},"PeriodicalIF":3.7,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10494778","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhang Jiang;Xianduo Li;Tianxiang Peng;Haoran Li;Jingxuan Hong;Jin Zhang;Xiaoli Gong
{"title":"Hybrid-Memcached: A Novel Approach for Memcached Persistence Optimization With Hybrid Memory","authors":"Zhang Jiang;Xianduo Li;Tianxiang Peng;Haoran Li;Jingxuan Hong;Jin Zhang;Xiaoli Gong","doi":"10.1109/TC.2024.3385279","DOIUrl":"10.1109/TC.2024.3385279","url":null,"abstract":"Memcached is a widely adopted, high-performance, in-memory key-value object caching system utilized in data centers. Nonetheless, its data is stored in volatile DRAM, making the cached data susceptible to loss during system shutdowns. Consequently, cold restarts experience significant delays. Persistent memory is a byte-addressable, large-capacity, and non-volatility storage media, which can be employed to avoid the cold restart problem. However, deploying Memcached on persistent memory requires consideration of issues such as write endurance, asymmetric read/write latency and bandwidth, and write granularity of persistent memory. In this paper, we propose Hybrid-Memcached, an optimized Memcached framework based on a hybrid combination of DRAM and persistent memory. Hybrid-Memcached includes three key components: (1) a DRAM-based data aggregation buffer to avoid multiple fine-grained writes, which extends the write endurance of persistent memory, (2) a data-object alignment mechanism to avoid write amplification, and (3) a non-temporal store instruction-based writing strategy to improve the bandwidth utilization. We have implemented Hybrid-Memcached on the Intel Optane persistent memory. Several micros-benchmarks are designed to evaluate Hybrid-Memcached by varying read/write ratios, access distributions, and key-value item sizes. Additionally, we evaluated it with the YCSB benchmark, showing a 21.2% performance improvement for fully write-intensive workloads and 11.8% for read-write balanced workloads.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1866-1874"},"PeriodicalIF":3.7,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LAC: A Workload Intensity-Aware Caching Scheme for High-Performance SSDs","authors":"Hui Sun;Haoqiang Tong;Yinliang Yue;Xiao Qin","doi":"10.1109/TC.2024.3385290","DOIUrl":"10.1109/TC.2024.3385290","url":null,"abstract":"Inside an NAND Flash-based solid-state disk (SSD), utilizing DRAM-based write-back caching is a practical approach to bolstering the SSD performance. Existing caching schemes overlook the problem of high user I/Os intensity due to the dramatic increment of I/Os accesses. The hefty I/O intensity causes access conflict of I/O requests inside an SSD: a large number of requests are blocked to impair response time. Conventional passive update caching schemes merely replace pages upon access misses in event of full cache. Tail latency occurs facing a colossal I/O intensity. Active write-back caching schemes utilize idle time among requests coupled with free internal bandwidth to flush dirty data into flash memory in advance, lowering response time. Frequent active write-back operations, however, cause access conflict of requests – a culprit that expands write amplification (WA) and degrades SSD lifetime. We address the above issues by proposing a \u0000<italic>work<b>L</b></i>\u0000oad intensity-aware and \u0000<bold><i>A</i></b>\u0000ctive parallel \u0000<bold><i>Caching</i></b>\u0000 scheme - LAC - that is powered by collaborative-load awareness. LAC fends off user I/Os’ access conflict under high-I/O-intensity workloads. If the I/O intensity is low – intervals between consecutive I/O requests are large – and the target die is free, LAC actively and concurrently writes dirty data of adjacent addresses back to the die, cultivating clean data generated by the active write-back. Replacing clean data in priority can reduce response time and prevent flash transactions from being blocked. We devise a data protection method to write back cold data based on various criteria in the cache replacement and active write-backs. Thus, LAC reduces WA incurred by actively writing back hot data and extends SSD lifetime. We compare LAC against the six caching schemes (LRU, CFLRU, GCaR-LRU, MQSim, VS-Batch, and Co-Active) in the modern MQSim simulator. The results unveil that LAC trims response time and erase count by up to 78.5% and 47.8%, with an average of 64.4% and 16.6%, respectively.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1738-1752"},"PeriodicalIF":3.7,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eric Guthmuller;César Fuguet;Andrea Bocco;Jérôme Fereyre;Riccardo Alidori;Ihsane Tahir;Yves Durand
{"title":"Xvpfloat: RISC-V ISA Extension for Variable Extended Precision Floating Point Computation","authors":"Eric Guthmuller;César Fuguet;Andrea Bocco;Jérôme Fereyre;Riccardo Alidori;Ihsane Tahir;Yves Durand","doi":"10.1109/TC.2024.3383964","DOIUrl":"10.1109/TC.2024.3383964","url":null,"abstract":"A key concern in the field of scientific computation is the convergence of numerical solvers when applied to large problems. The numerical workarounds used to improve convergence are often problem specific, time consuming and require skilled numerical analysts. An alternative is to simply increase the working precision of the computation, but this is difficult due to the lack of efficient hardware support for extended precision. We propose \u0000<i>Xvpfloat</i>\u0000, a RISC-V ISA extension for dynamically variable and extended precision computation, a hardware implementation and a full software stack. Our architecture provides a comprehensive implementation of this ISA, with up to 512 bits of significand, including full support for common rounding modes and heterogeneous precision arithmetic operations. The memory subsystem handles IEEE 754 extendable formats, and features specialized indexed loads and stores with hardware-assisted prefetching. This processor can either operate standalone or as an accelerator for a general purpose host. We demonstrate that the number of solver iterations can be reduced up to \u0000<inline-formula><tex-math>$5boldsymbol{times}$</tex-math></inline-formula>\u0000 and, for certain, difficult problems, convergence is only possible with very high precision (\u0000<inline-formula><tex-math>$boldsymbol{geq}$</tex-math></inline-formula>\u0000384 bits). This accelerator provides a new approach to accelerate large scale scientific computing.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1683-1697"},"PeriodicalIF":3.7,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10488759","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Honghong Zeng;Jie Li;Jiong Lou;Shijing Yuan;Chentao Wu;Wei Zhao;Sijin Wu;Zhiwen Wang
{"title":"BSR-FL: An Efficient Byzantine-Robust Privacy-Preserving Federated Learning Framework","authors":"Honghong Zeng;Jie Li;Jiong Lou;Shijing Yuan;Chentao Wu;Wei Zhao;Sijin Wu;Zhiwen Wang","doi":"10.1109/TC.2024.3404102","DOIUrl":"10.1109/TC.2024.3404102","url":null,"abstract":"Federated learning (FL) is a technique that enables clients to collaboratively train a model by sharing local models instead of raw private data. However, existing reconstruction attacks can recover the sensitive training samples from the shared models. Additionally, the emerging poisoning attacks also pose severe threats to the security of FL. However, most existing Byzantine-robust privacy-preserving federated learning solutions either reduce the accuracy of aggregated models or introduce significant computation and communication overheads. In this paper, we propose a novel \u0000<underline>B</u>\u0000lockchain-based \u0000<underline>S</u>\u0000ecure and \u0000<underline>R</u>\u0000obust \u0000<underline>F</u>\u0000ederated \u0000<underline>L</u>\u0000earning (BSR-FL) framework to mitigate reconstruction attacks and poisoning attacks. BSR-FL avoids accuracy loss while ensuring efficient privacy protection and Byzantine robustness. Specifically, we first construct a lightweight non-interactive functional encryption (NIFE) scheme to protect the privacy of local models while maintaining high communication performance. Then, we propose a privacy-preserving defensive aggregation strategy based on NIFE, which can resist encrypted poisoning attacks without compromising model privacy through secure cosine similarity and incentive-based Byzantine-tolerance aggregation. Finally, we utilize the blockchain system to assist in facilitating the processes of federated learning and the implementation of protocols. Extensive theoretical analysis and experiments demonstrate that our new BSR-FL has enhanced privacy security, robustness, and high efficiency.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 8","pages":"2096-2110"},"PeriodicalIF":3.6,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141147534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BlockCompass: A Benchmarking Platform for Blockchain Performance","authors":"Mohammadreza Rasolroveicy;Wejdene Haouari;Marios Fokaefs","doi":"10.1109/TC.2024.3404103","DOIUrl":"10.1109/TC.2024.3404103","url":null,"abstract":"Blockchain technology has gained momentum due to its immutability and transparency. Several blockchain platforms, each with different consensus protocols, have been proposed. However, choosing and configuring such a platform is a non-trivial task. Numerous benchmarking tools have been introduced to test the performance of blockchain solutions. Yet, these tools are often limited to specific blockchain platforms or require complex configurations. Moreover, they tend to focus on one-off batch evaluation models, which may not be ideal for longer-running instances under continuous workloads. In this work, we present \u0000<italic>BlockCompass</i>\u0000, an all-inclusive blockchain benchmarking tool that can be easily configured and extended. We demonstrate how \u0000<italic>BlockCompass</i>\u0000 can evaluate the performance of various blockchain platforms and configurations, including Ethereum Proof-of-Authority, Ethereum Proof-of-Work, Hyperledger Fabric Raft, Hyperledger Sawtooth with Proof-of-Elapsed-Time, Practical Byzantine Fault Tolerance, and Raft consensus algorithms, against workloads that continuously fluctuate over time. We show how continuous transactional workloads may be more appropriate than batch workloads in capturing certain stressful events for the system. Finally, we present the results of a usability study about the convenience and effectiveness offered by \u0000<italic>BlockCompass</i>\u0000 in blockchain benchmarking.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 8","pages":"2111-2122"},"PeriodicalIF":3.6,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141147537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Machine Learning-Empowered Cache Management Scheme for High-Performance SSDs","authors":"Hui Sun;Chen Sun;Haoqiang Tong;Yinliang Yue;Xiao Qin","doi":"10.1109/TC.2024.3404064","DOIUrl":"10.1109/TC.2024.3404064","url":null,"abstract":"NAND Flash-based solid-state drives (SSDs) have gained widespread usage in data storage thanks to their exceptional performance and low power consumption. The computational capability of SSDs has been elevated to tackle complex algorithms. Inside an SSD, a DRAM cache for frequently accessed requests reduces response time and write amplification (WA), thereby improving SSD performance and lifetime. Existing caching schemes, based on temporal locality, overlook its variations, which potentially reduces cache hit rates. Some caching schemes bolster performance via flash-aware techniques but at the expense of the cache hit rate. To address these issues, we propose a random forest machine learning \u0000<bold>C</b>\u0000lassifier-empowered \u0000<bold>C</b>\u0000ache scheme named CCache, where I/O requests are classified into critical, intermediate, and non-critical ones according to their access status. After designing a machine learning model to predict these three types of requests, we implement a trie-level linked list to manage the cache placement and replacement. CCache safeguards critical requests for cache service to the greatest extent, while granting the highest priority to evicting request accessed by non-critical requests. CCache – considering chip state when processing non-critical requests – is implemented in an SSD simulator (SSDSim). CCache outperforms the alternative caching schemes, including LRU, CFLRU, LCR, NCache, ML_WP, and CCache_ANN, in terms of response time, WA, erase count, and hit ratio. The performance discrepancy between CCache and the OPT scheme is marginal. For example, CCache reduces the response time of the competitors by up to 41.9% with an average of 16.1%. CCache slashes erase counts by a maximum of 67.4%, with an average of 21.3%. The performance gap between CCache and and OPT is merely 2.0%-3.0%.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 8","pages":"2066-2080"},"PeriodicalIF":3.6,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141147533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunkun Liao;Jingya Wu;Wenyan Lu;Xiaowei Li;Guihai Yan
{"title":"DPU-Direct: Unleashing Remote Accelerators via Enhanced RDMA for Disaggregated Datacenters","authors":"Yunkun Liao;Jingya Wu;Wenyan Lu;Xiaowei Li;Guihai Yan","doi":"10.1109/TC.2024.3404089","DOIUrl":"10.1109/TC.2024.3404089","url":null,"abstract":"This paper presents DPU-Direct, an accelerator disaggregation system that connects accelerator nodes (ANs) and CPU nodes (CNs) over a standard Remote Direct Memory Access (RDMA) network. DPU-Direct eliminates the latency introduced by the CPU-based network stack, and PCIe interconnects between network I/O and the accelerator. The DPU-Direct system architecture includes a DPU Wrapper hardware architecture, an RDMA-based Accelerator Access Pattern (RAAP), and a CN-side programming model. The DPU Wrapper connects accelerators directly with the RDMA engine, turning ANs into disaggregation-native devices. The RAAP provides the CN with low-latency and high throughput accelerator semantics based on standard RDMA operations. Our FPGA prototype demonstrates DPU-Direct's efficacy with two proof-of-concept applications: AES encryption and key-value cache, which are computationally intensive and latency-sensitive. DPU-Direct yields a 400x speedup in AES encryption over the CPU baseline and matches the performance of the locally integrated AES accelerator. For key-value cache, DPU-Direct reduces the average end-to-end latency by 1.66x for GETs and 1.30x for SETs over the CPU-RDMA-Polling baseline, reducing latency jitter by over 10x for both operations.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 8","pages":"2081-2095"},"PeriodicalIF":3.6,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141147535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dengcheng Hu;Jianrong Wang;Xiulong Liu;Qi Li;Keqiu Li
{"title":"LMChain: An Efficient Load-Migratable Beacon-Based Sharding Blockchain System","authors":"Dengcheng Hu;Jianrong Wang;Xiulong Liu;Qi Li;Keqiu Li","doi":"10.1109/TC.2024.3404057","DOIUrl":"10.1109/TC.2024.3404057","url":null,"abstract":"Sharding is an important technology that utilizes group parallelism to enhance the scalability and performance of blockchain. However, the existing solutions use a historical transaction-based approach to reallocate shards, which cannot handle temporary overload and incurs additional overhead during the reallocation process. To this end, this paper proposes LMChain, an efficient load-migratable beacon-based sharding blockchain system. The primary goal of LMChain is to eliminate reliance on historical transactions and achieve the high performance. Specifically, we redesign the state maintenance data structure in Beacon Shard to effectively manage all account states at the shard level. Then, we innovatively propose a load-migratable transaction processing protocol built upon the new data structure. To mitigate read-write conflicts during the selection of migration transactions, we adopt a novel graph partitioning scheme. We also adopt a relay-based method to handle cross-shard transactions and resolve inter-shard state read-write conflicts. We implement the LMChain prototype and conduct experiments in a real network environment comprising 17 cloud servers. Experimental results show that, compared with state-of-the-art solutions, LMChain effectively reduces the average transaction waiting latency of overloaded transactions by 30% to 48% in different cases within 16 transaction shards, while improving throughput by 3% to 10%.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 9","pages":"2178-2191"},"PeriodicalIF":3.6,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141147547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}