IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems最新文献

筛选
英文 中文
Time-Triggered Scheduling for Nonpreemptive Real-Time DAG Tasks Using 1-Opt Local Search 使用单选本地搜索为非抢占式实时 DAG 任务进行时间触发调度
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3442985
Sen Wang;Dong Li;Shao-Yu Huang;Xuanliang Deng;Ashrarul H. Sifat;Jia-Bin Huang;Changhee Jung;Ryan Williams;Haibo Zeng
{"title":"Time-Triggered Scheduling for Nonpreemptive Real-Time DAG Tasks Using 1-Opt Local Search","authors":"Sen Wang;Dong Li;Shao-Yu Huang;Xuanliang Deng;Ashrarul H. Sifat;Jia-Bin Huang;Changhee Jung;Ryan Williams;Haibo Zeng","doi":"10.1109/TCAD.2024.3442985","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3442985","url":null,"abstract":"Modern real-time systems often involve numerous computational tasks characterized by intricate dependency relationships. Within these systems, data propagate through cause–effect chains from one task to another, making it imperative to minimize end-to-end latency to ensure system safety and reliability. In this article, we introduce innovative nonpreemptive scheduling techniques designed to reduce the worst-case end-to-end latency and/or time disparity for task sets modeled with directed acyclic graphs (DAGs). This is challenging because of the noncontinuous and nonconvex characteristics of the objective functions, hindering the direct application of standard optimization frameworks. Customized optimization frameworks aiming at achieving optimal solutions may suffer from scalability issues, while general heuristic algorithms often lack theoretical performance guarantees. To address this challenge, we incorporate the “1-opt” concept from the optimization literature (Essentially, 1-opt means that the quality of a solution cannot be improved if only one single variable can be changed) into the design of our algorithm. We propose a novel optimization algorithm that effectively balances the tradeoff between theoretical guarantees and algorithm scalability. By demonstrating its theoretical performance guarantees, we establish that the algorithm produces 1-opt solutions while maintaining polynomial run-time complexity. Through extensive large-scale experiments, we demonstrate that our algorithm can effectively reduce the latency metrics by 20% to 40%, compared to state-of-the-art methods.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3650-3661"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Page Type-Aware Full-Sequence Program Scheduling via Reinforcement Learning in High Density SSDs 通过高密度固态硬盘中的强化学习实现页面类型感知的全序列程序调度
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3444718
Jun Li;Zhigang Cai;Balazs Gerofi;Yutaka Ishikawa;Jianwei Liao
{"title":"Page Type-Aware Full-Sequence Program Scheduling via Reinforcement Learning in High Density SSDs","authors":"Jun Li;Zhigang Cai;Balazs Gerofi;Yutaka Ishikawa;Jianwei Liao","doi":"10.1109/TCAD.2024.3444718","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3444718","url":null,"abstract":"Full-sequence program (FSP) can program multiple bits simultaneously, and thus complete a multiple-page write at one time for naturally enhancing write performance of high density 3-D solid-state drives (SSDs). This article proposes an FSP scheduling approach for the 3-D quad-level cell (QLC) SSDs, to further boost their read responsiveness. Considering each FSP operation in QLC SSDs spans \u0000<monospace>four</monospace>\u0000 different types of QLC pages having dissimilar read latency, we introduce matching four pages of application data to the suited QLC pages and flush them together with the one-shot program of FSP. To this end, we employ reinforcement learning to classify the (cached) application data into \u0000<monospace>four</monospace>\u0000 categories on the basis of their historical access frequency and the associating request size. Thus, the frequently read data can be mapped to the QLC pages having less access latency, meanwhile the other data can be flushed onto the slow QLC pages. Then, we can group four different categories of data pages and flush them together into a four-page unit of 3-D QLC SSDs with an FSP operation. In addition, a proactive rewrite method is also triggered for grouping the hot read data with the cached data to form an FSP unit. Through a series of emulation tests on several realistic disk traces, we show that the proposed mechanisms yields notable performance improvement on the read responsiveness.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3696-3707"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GEAR: Graph-Evolving Aware Data Arranger to Enhance the Performance of Traversing Evolving Graphs on SCM GEAR:提高单片机上遍历演化图性能的图形演化感知数据排列器
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3447222
Wen-Yi Wang;Chun-Feng Wu;Yun-Chih Chen;Tei-Wei Kuo;Yuan-Hao Chang
{"title":"GEAR: Graph-Evolving Aware Data Arranger to Enhance the Performance of Traversing Evolving Graphs on SCM","authors":"Wen-Yi Wang;Chun-Feng Wu;Yun-Chih Chen;Tei-Wei Kuo;Yuan-Hao Chang","doi":"10.1109/TCAD.2024.3447222","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3447222","url":null,"abstract":"In the era of big data, social network services continuously modify social connections, leading to dynamic and evolving graph data structures. These evolving graphs, vital for representing social relationships, pose significant memory challenges as they grow over time. To address this, storage-class-memory (SCM) emerges as a cost-effective solution alongside DRAM. However, contemporary graph evolution processes often scatter neighboring vertices across multiple pages, causing weak graph spatial locality and high-TLB misses during traversals. This article introduces SCM-Based graph-evolving aware data arranger (GEAR), a joint management middleware optimizing data arrangement on SCMs to enhance graph traversal efficiency. SCM-based GEAR comprises multilevel page allocation, locality-aware data placement, and dual-granularity wear leveling techniques. Multilevel page allocation prevents scattering of neighbor vertices relying on managing each page in a finer-granularity, while locality-aware data placement reserves space for future updates, maintaining strong graph spatial locality. The dual-granularity wear leveler evenly distributes updates across SCM pages with considering graph traversing characteristics. Evaluation results demonstrate SCM-based GEAR’s superiority, achieving 23% to 70% reduction in traversal time compared to state-of-the-art frameworks.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3674-3684"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Scalable 2T-1FeFET-Based Content Addressable Memory Design for Energy Efficient Data Search 基于2t - 1fet的可扩展内容寻址存储器节能数据搜索设计
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3493000
Jiahao Cai;Hamza E. Barkam;Mohsen Imani;Kai Ni;Grace Li Zhang;Bing Li;Ulf Schlichtmann;Cheng Zhuo;Xunzhao Yin
{"title":"A Scalable 2T-1FeFET-Based Content Addressable Memory Design for Energy Efficient Data Search","authors":"Jiahao Cai;Hamza E. Barkam;Mohsen Imani;Kai Ni;Grace Li Zhang;Bing Li;Ulf Schlichtmann;Cheng Zhuo;Xunzhao Yin","doi":"10.1109/TCAD.2024.3493000","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3493000","url":null,"abstract":"Content addressable memory (CAM) is widely used in advanced machine learning models and data-intensive applications for associative search tasks, thanks to the highly parallel pattern matching capability. Most state-of-the-art CAM designs primarily aim to reduce the CAM cell area by utilizing nonvolatile memories (NVMs). However, there has been limited research on optimizing the design and energy efficiency of NVM-based CAMs for practical deployment in edge devices and AI hardware. This article introduces a general compact and energy efficient CAM design scheme that minimizes design overhead by using only one NVM device per cell. Our proposed CAM design realizes both binary CAM (BCAM) and multibit CAM (MCAM) by leveraging the binary and multilevel storage property of NVM devices without additional cell overheads. Additionally, we propose an adaptive matchline (ML) precharge and discharge scheme to further optimize search energy by significantly reducing the ML voltage swing. Ferroelectric field-effect transistors (FeFETs) serve as representative NVMs in our proposed design, and we present a 2T-1FeFET CAM array incorporating a sense amplifier that implements the proposed ML scheme. Evaluation results show that our proposed 2T-1FeFET BCAM design achieves energy efficiency improvements of <inline-formula> <tex-math>$6.64times $ </tex-math></inline-formula>/<inline-formula> <tex-math>$4.74times $ </tex-math></inline-formula>/<inline-formula> <tex-math>$9.14times $ </tex-math></inline-formula>/<inline-formula> <tex-math>$3.02times $ </tex-math></inline-formula> compared to CMOS/ReRAM/STT-MRAM/2FeFET BCAM arrays, while 2T-1FeFET MCAM design achieves <inline-formula> <tex-math>$8.25times $ </tex-math></inline-formula>/<inline-formula> <tex-math>$5.68times $ </tex-math></inline-formula>/<inline-formula> <tex-math>$56.35times $ </tex-math></inline-formula> better-energy efficiency compared to ReRAM/3T-1FeFET/1FeFET-1R MACM arrays. Benchmarking results demonstrate that our BCAM/MCAM approach provides <inline-formula> <tex-math>$3.2times $ </tex-math></inline-formula>/<inline-formula> <tex-math>$3.7times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$2.0times $ </tex-math></inline-formula>/<inline-formula> <tex-math>$2.2times $ </tex-math></inline-formula> energy-delay product improvement over the 2T-2R and 2FeFET CAM in accelerating query processing applications.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 5","pages":"1760-1773"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143871124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Homogeneous FeFET-Based Time-Domain Compute-in-Memory Fabric for Matrix-Vector Multiplication and Associative Search 基于均匀场效应效应的矩阵-向量乘法和关联搜索时域内存计算结构
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3492994
Xunzhao Yin;Qingrong Huang;Hamza Errahmouni Barkam;Franz Müller;Shan Deng;Alptekin Vardar;Sourav De;Zhouhang Jiang;Mohsen Imani;Ulf Schlichtmann;Xiaobo Sharon Hu;Cheng Zhuo;Thomas Kämpfe;Kai Ni
{"title":"A Homogeneous FeFET-Based Time-Domain Compute-in-Memory Fabric for Matrix-Vector Multiplication and Associative Search","authors":"Xunzhao Yin;Qingrong Huang;Hamza Errahmouni Barkam;Franz Müller;Shan Deng;Alptekin Vardar;Sourav De;Zhouhang Jiang;Mohsen Imani;Ulf Schlichtmann;Xiaobo Sharon Hu;Cheng Zhuo;Thomas Kämpfe;Kai Ni","doi":"10.1109/TCAD.2024.3492994","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3492994","url":null,"abstract":"Matrix-vector multiplication (MVM) and content-based search are two key operations in many machine learning workloads. This article proposes a ferroelectric FET (FeFET) time-domain compute-in-memory (TD-CiM) array that can accelerate both operations in a homogeneous fabric. We demonstrate that 1) the AND and xor/XNOR logic functions required by MVM and content-based search can be realized using a single compute-in-memory (CiM) cell composed of 2FeFETs connected in series; 2) an inverter chain-based TD-CiM array along with a two-phase time-domain computation principle of the TD-CiM can be employed to implement the MVM and content-based search functions; 3) a signal delay-to-digital output conversion can be implemented by associating a loading capacitor with each stage of the inverter chain-based TD-CiM array, ensuring the full digital compatibility; and 4) the proposed 2FeFET cell and inverter chain-based TD-CiM array are robust against FeFET variation according to our comprehensive theoretical and experimental validation. We show how the FeFET TD-CiM can be exploited to accelerate hyperdimensional computing (HDC) and adjusted to process different tasks through dynamic and fine-grained resource allocation. HDC application benchmarking results show that the proposed FeFET-based TD-CiM offers on average <inline-formula> <tex-math>$106times $ </tex-math></inline-formula>/<inline-formula> <tex-math>$63times $ </tex-math></inline-formula> energy reduction/speedup compared to GPU-based implementation. With more than 8500 TOPS/W energy-efficiency, the proposed FeFET-based TD-CiM exhibits huge potential as a processing fabric for various memory-intensive applications.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 5","pages":"1856-1868"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143871021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OPIMA: Optical Processing-in-Memory for Convolutional Neural Network Acceleration OPIMA:用于卷积神经网络加速的光学内存处理技术
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3446870
Febin Sunny;Amin Shafiee;Abhishek Balasubramaniam;Mahdi Nikdast;Sudeep Pasricha
{"title":"OPIMA: Optical Processing-in-Memory for Convolutional Neural Network Acceleration","authors":"Febin Sunny;Amin Shafiee;Abhishek Balasubramaniam;Mahdi Nikdast;Sudeep Pasricha","doi":"10.1109/TCAD.2024.3446870","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3446870","url":null,"abstract":"Recent advances in machine learning (ML) have spotlighted the pressing need for computing architectures that bridge the gap between memory bandwidth and processing power. The advent of deep neural networks has pushed traditional Von Neumann architectures to their limits due to the high latency and energy consumption costs associated with data movement between the processor and memory for these workloads. One of the solutions to overcome this bottleneck is to perform computation within the main memory through processing-in-memory (PIM), thereby limiting data movement and the costs associated with it. However, dynamic random-access memory-based PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks and the need for frequent refresh operations. In this work, we introduce OPIMA, a PIM-based ML accelerator, architected within an optical main memory. OPIMA has been designed to leverage the inherent massive parallelism within main memory while performing high-speed, low-energy optical computation to accelerate ML models based on convolutional neural networks. We present a comprehensive analysis of OPIMA to guide design choices and operational mechanisms. In addition, we evaluate the performance and energy consumption of OPIMA, comparing it with conventional electronic computing systems and emerging photonic PIM architectures. The experimental results show that OPIMA can achieve \u0000<inline-formula> <tex-math>$2.98times $ </tex-math></inline-formula>\u0000 higher throughput and \u0000<inline-formula> <tex-math>$137times $ </tex-math></inline-formula>\u0000 better energy efficiency than the best known prior work.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3888-3899"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing SRAM-Based PUF Reliability Through Machine Learning-Aided Calibration Techniques 通过机器学习辅助校准技术提高基于 SRAM 的 PUF 可靠性
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3449570
Kuheli Pratihar;Soumi Chatterjee;Rajat Subhra Chakraborty;Debdeep Mukhopadhyay
{"title":"Enhancing SRAM-Based PUF Reliability Through Machine Learning-Aided Calibration Techniques","authors":"Kuheli Pratihar;Soumi Chatterjee;Rajat Subhra Chakraborty;Debdeep Mukhopadhyay","doi":"10.1109/TCAD.2024.3449570","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3449570","url":null,"abstract":"Static random access memory (SRAM)-based physically unclonable functions (PUFs) utilize unpredictable start-up values (SUVs) for key generation, making them widely adopted in cryptographic systems. This unpredictability in SUVs is accompanied by device noise that escalates with process-voltage–temperature (PVT) variations, resulting in significant deviations from the golden response collected at ambient conditions, thereby increasing the bit-error-rate (BER) of the PUF responses. To reduce this high-\u0000<inline-formula> <tex-math>$(geq 15%)$ </tex-math></inline-formula>\u0000 BER, either an involved error correcting code (ECC) circuitry with significant overhead is required, or more helper information needs to be generated at varying operating conditions, resulting in increased information leakage. We address this issue by proposing the first reported application of machine learning to recalibrate the responses by predicting the golden responses of the SRAM-based PUF (SRAM-PUF) at different operating conditions with high accuracy. Our recalibration technique is based on a novel collective decision that involves observing the neighborhood cells of the SRAM-PUF, as opposed to the traditional single-cell approach. By leveraging a memory map exhibiting a high correlation in ambient reliability amongst neighboring cells, we indirectly use the physical co-location of SRAM cells to assist neighborhood error prediction. It leads to efficient post-processing for SRAM-PUFs by using helper data generated at ambient conditions only while employing a fixed ECC designed for the same. Subsequently, to justify our claims and validate the efficacy of our proposed methodology, we demonstrate extensive experimentation results over multiple SRAM-PUF instances implemented on the Arduino UNO (an 8-bit microcontroller unit) and its scaled-up version, the Arduino Zero (a 32-bit microcontroller unit) boards, by varying supply voltages from 3.8 to 6.2 V and 7 to 12 V, respectively, and temperature from −25° to 70° C in both cases. Our observations show a vast drop in BER from 17.02% to \u0000<inline-formula> <tex-math>$approx 1%$ </tex-math></inline-formula>\u0000. Although worst-case conditions with both voltage and temperature variations at play resulted in a BER of 20%, using our proposed approach reduces it to \u0000<inline-formula> <tex-math>$approx 1{text {-}} 2%$ </tex-math></inline-formula>\u0000, in turn demonstrating the high efficacy of our scheme.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3491-3502"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LightFS: A Lightweight Host-CSD Coordinated File System Optimizing for Heavy Small File Accesses LightFS:优化重型小文件访问的轻量级主机-CSD 协调文件系统
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3443010
Jiali Li;Zhaoyan Shen;Duo Liu;Xianzhang Chen;Kan Zhong;Zhaoyang Zeng;Yujuan Tan
{"title":"LightFS: A Lightweight Host-CSD Coordinated File System Optimizing for Heavy Small File Accesses","authors":"Jiali Li;Zhaoyan Shen;Duo Liu;Xianzhang Chen;Kan Zhong;Zhaoyang Zeng;Yujuan Tan","doi":"10.1109/TCAD.2024.3443010","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3443010","url":null,"abstract":"Computational storage drive (CSD) improves the data processing efficiency by processing the data within the storage. However, existing CSDs rely on the host-centric file systems to manage the data, where the layouts of files are retrieved by the host and sent to the CSD, resulting in additional I/O overhead and reduced processing efficiency, especially in heavy small file accesses. Moreover, the lack of consistency mechanisms poses potential consistency issues. To address these challenges, we propose LightFS, a lightweight host-CSD coordinated file system for the CSD file management. To reduce task offloading overhead, LightFS builds an index file \u0000<inline-formula> <tex-math>$.ndpmeta$ </tex-math></inline-formula>\u0000 which summarizes the files’ metadata and shares between the host and CSD to enable CSD to retrieve the file layout in storage directly. To ensure consistency, LightFS employs a metadata locker and an update synchronizer. The metadata locker leverages the out-of-place update feature of the flash to capture a snapshot of the file to be written without any data copy, while the update synchronizer triggers metadata updates by monitoring the addresses of written blocks to ensure that the modified file is successfully written to the CSD. We implement and evaluate LightFS on a real testbed, and the results demonstrate that LightFS achieves \u0000<inline-formula> <tex-math>$3.66times $ </tex-math></inline-formula>\u0000 performance improvement on the average in real-world operations.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3527-3538"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FIRM-Tree: A Multidimensional Index Structure for Reprogrammable Flash Memory FIRM-Tree:可重复编程闪存的多维索引结构
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3445809
Shin-Ting Wu;Pin-Jung Chen;Po-Chun Huang;Wei-Kuan Shih;Yuan-Hao Chang
{"title":"FIRM-Tree: A Multidimensional Index Structure for Reprogrammable Flash Memory","authors":"Shin-Ting Wu;Pin-Jung Chen;Po-Chun Huang;Wei-Kuan Shih;Yuan-Hao Chang","doi":"10.1109/TCAD.2024.3445809","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3445809","url":null,"abstract":"For many emerging data-centric computing applications, it is a key capability to efficiently store, manage, and access multidimensional data. To achieve this, many multidimensional index data structures have been proposed. However, when existing multidimensional index data structures are maintained on modern nonvolatile memories (NVMs), such as NAND flash memory, they often face challenges in effective management of multidimensional data and handling of memory medium peculiarities, such as the write-once property and the need for block reclamation of NAND flash memory. Without appropriate management, these challenges often result in serious amplification of the read/write traffic, which degrades the performance of multidimensional data structures. Motivated by the urgent needs of efficient multidimensional index data structures on modern NVMs, we propose the FIRM-tree, a time-efficient and space-economic index data structure for multidimensional point data on NAND flash memory. Unique to the prior work, the FIRM-tree holistically utilizes RAM and flash memory space, and dedicatedly leverages the page reprogrammability of modern NAND flash memory, to enhance data access performance and flash management overheads. We then verify our proposal through analytical and experimental studies, where the results are quite encouraging.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3600-3613"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Thread Carefully: Preventing Starvation in the ROS 2 Multithreaded Executor 小心线程:防止 ROS 2 多线程执行器出现饥饿现象
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3446865
Harun Teper;Daniel Kuhse;Mario Günzel;Georg von der Brüggen;Falk Howar;Jian-Jia Chen
{"title":"Thread Carefully: Preventing Starvation in the ROS 2 Multithreaded Executor","authors":"Harun Teper;Daniel Kuhse;Mario Günzel;Georg von der Brüggen;Falk Howar;Jian-Jia Chen","doi":"10.1109/TCAD.2024.3446865","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3446865","url":null,"abstract":"The robot operating system 2 (ROS 2) is a widely used collection of tools and libraries for building robot applications. It is designed to be flexible and easy to use when creating complex robot systems with many interacting components.Since its alpha version release in 2015, ROS 2 provides two options in a multithreading operating system, namely the single-threaded executor and the multithreaded executor. The single-threaded executor is starvation-free by design (i.e., every task is eventually executed) even in over-utilized systems, since the set of eligible task instances (called wait set) is only refilled once all the task instances in the wait set are executed. The multithreaded executor extends this mechanism to multiple threads that manage the wait set collaboratively. While intuitively this extension preserves the starvation-free property, and analyses for the multithreaded executor even build upon this assumption, the multithreaded executor has not been shown to be starvation-free.In this work, we examine the mechanism of the multithreaded executor in ROS 2 and demonstrate that it is prone to starvation, i.e., some tasks may never be executed even in under-utilized systems. This indicates risks for multithreaded executors in the current ROS 2 design and further leads to counterexamples to the state-of-the-art response-time analyses by Jiang et al. (RTSS 2022) and Sobhani et al. (RTAS 2023). We propose a minimal change in the software architecture of the ROS 2 multithreaded executor to enable starvation- and deadlock-free behavior. We empirically test that we prevent starvation in concrete ROS 2 system configurations, and show that our solution incurs a negligible overhead using the autoware reference benchmark. Moreover, we prove that our solution is starvation- and deadlock-free using formal proofs and model checking.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3588-3599"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10745787","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信