IEEE Transactions on Computers最新文献

筛选
英文 中文
Hierarchical Hashing: A Dynamic Hashing Method With Low Write Amplification and High Performance for Non-Volatile Memory
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-12-16 DOI: 10.1109/TC.2024.3517737
Jinquan Wang;Zhisheng Huo;Limin Xiao;Jinqian Yang;Jiantong Huo;Minyi Guo
{"title":"Hierarchical Hashing: A Dynamic Hashing Method With Low Write Amplification and High Performance for Non-Volatile Memory","authors":"Jinquan Wang;Zhisheng Huo;Limin Xiao;Jinqian Yang;Jiantong Huo;Minyi Guo","doi":"10.1109/TC.2024.3517737","DOIUrl":"https://doi.org/10.1109/TC.2024.3517737","url":null,"abstract":"The hashing method is widely used as the index structure, which can be stored in NVM to improve the application performance. However, existing hashing methods may cause high extra write amplification to NVM and bring high additional storage overhead on NVM while providing low request performance. To solve these problems, we have proposed a dynamic hashing method called <italic>Hierarchical Hashing</i>, whose basic idea is to leverage a novel hash collision resolution mechanism that can dynamically expand the size of the hash table. <italic>Hierarchical Hashing</i> can incur no extra write amplification to NVM when resolving hash collisions. Additionally, it can directly address all cells when resizing the hash table, thereby avoiding the additional storage overhead caused by non-addressable linked lists. Furthermore, the request performance can be improved as all cells of the hash table are addressable when resizing to resolve hash collisions. The experimental results demonstrate that <italic>Hierarchical Hashing</i> brings no extra write amplification to NVM and achieves nearly 90% space utilization and high request performance while providing 99% memory utilization, compared with existing representative hashing methods.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 4","pages":"1138-1151"},"PeriodicalIF":3.6,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AutoPipe-H: A Heterogeneity-Aware Data-Paralleled Pipeline Approach on Commodity GPU Servers
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-12-16 DOI: 10.1109/TC.2024.3517748
Weijie Liu;Kai Lu;Zhiquan Lai;Shengwei Li;Keshi Ge;Dongsheng Li;Xicheng Lu
{"title":"AutoPipe-H: A Heterogeneity-Aware Data-Paralleled Pipeline Approach on Commodity GPU Servers","authors":"Weijie Liu;Kai Lu;Zhiquan Lai;Shengwei Li;Keshi Ge;Dongsheng Li;Xicheng Lu","doi":"10.1109/TC.2024.3517748","DOIUrl":"https://doi.org/10.1109/TC.2024.3517748","url":null,"abstract":"Recently, the data-parallel pipeline approach has been widely used in training DNN models on commodity GPU servers. However, there are still three challenges for hybrid parallelism on commodity GPU servers: i) a balanced model partition is crucial for efficiency, whereas prior works lack a sound solution to generate a balanced partition automatically; ii) an orchestrated device mapping is essential to reduce communication contention, however, prior works ignore server heterogeneity, exacerbating communication contention; iii) the startup overhead is inevitable and especially significant for deep pipelines, which is an essential source of pipeline bubbles and severely affects pipeline scalability. We propose <italic>AutoPipe-H</i> to solve these three problems, which contains i) a <italic>pipeline partitioner</i> component for automatically and quickly generating a balanced sub-block partition scheme; ii) a <italic>device mapping</i> component that assigns pipeline stages to devices, considering server heterogeneity, to reduce communication contention; and iii) a <italic>distributed training runtime</i> component that reduces pipeline startup overhead by splitting the micro-batch evenly. The experimental results show that AutoPipe-H can accelerate training by up to 1.26x over the hybrid parallelism framework DAPPLE and Piper, with a 2.73x-12.7x improvement in the partition balance and an order-of-magnitude time reduction in partition scheme searching.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 4","pages":"1196-1209"},"PeriodicalIF":3.6,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A GPU-Enabled Framework for Light Field Efficient Compression and Real-Time Rendering
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-12-16 DOI: 10.1109/TC.2024.3517743
Mingyuan Zhao;Hao Sheng;Rongshan Chen;Ruixuan Cong;Tun Wang;Zhenglong Cui;Da Yang;Shuai Wang;Wei Ke
{"title":"A GPU-Enabled Framework for Light Field Efficient Compression and Real-Time Rendering","authors":"Mingyuan Zhao;Hao Sheng;Rongshan Chen;Ruixuan Cong;Tun Wang;Zhenglong Cui;Da Yang;Shuai Wang;Wei Ke","doi":"10.1109/TC.2024.3517743","DOIUrl":"https://doi.org/10.1109/TC.2024.3517743","url":null,"abstract":"Real-time rendering offers instantaneous visual feedback, making it crucial for mixed-reality applications. The light field captures both light intensity and direction in a 3D environment, serving as a data-rich medium to enhance mixed-reality experiences. However, two major challenges remain: 1) current light field rendering techniques are unsuitable for real-time computation, and 2) existing real-time methods cannot efficiently process high-dimensional light field data on GPU platforms. To overcome these challenges, we propose an framework utilizing a compact neural representation of light field data, implemented on a GPU platform for real-time rendering. This framework provides both compact storage and high-fidelity real-time computation. Specifically, we introduce a ray global alignment strategy to simplify the framework and improve practicality. This strategy enables the learning of an optimal embedding for all local rays in a globally consistent way, removing the need for camera pose calculations. To achieve effective compression, the neural light field is employed to map each embedded ray to its corresponding color. To enable real-time rendering, we design a novel super-resolution network to enhance rendering speed. Extensive experiments demonstrate that our framework significantly enhances compression efficiency and real-time rendering performance, achieving nearly 50<inline-formula><tex-math>$mathbf{times}$</tex-math></inline-formula> compression ratio and 100 FPS rendering.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 4","pages":"1168-1181"},"PeriodicalIF":3.6,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware Accelerated Vision Transformer via Heterogeneous Architecture Design and Adaptive Dataflow Mapping
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-12-16 DOI: 10.1109/TC.2024.3517751
Yingxue Gao;Teng Wang;Lei Gong;Chao Wang;Dong Dai;Yang Yang;Xianglan Chen;Xi Li;Xuehai Zhou
{"title":"Hardware Accelerated Vision Transformer via Heterogeneous Architecture Design and Adaptive Dataflow Mapping","authors":"Yingxue Gao;Teng Wang;Lei Gong;Chao Wang;Dong Dai;Yang Yang;Xianglan Chen;Xi Li;Xuehai Zhou","doi":"10.1109/TC.2024.3517751","DOIUrl":"https://doi.org/10.1109/TC.2024.3517751","url":null,"abstract":"Vision transformer (ViT) models have demonstrated remarkable advantages in visual tasks. However, the ViT model contains various types of operators, and its sophisticated model structure imposes substantial computational complexity and storage burden. Existing hardware solutions still fail to fully unleash the ViT acceleration potential due to the mismatch between operators and hardware architectures, suffering from inefficient dataflow mapping. This work proposes HDViT, a full-fledged heterogeneous hardware accelerator on FPGA, to enhance the ViT acceleration by comprehensively analyzing and addressing the challenges of heterogeneous architecture design. Specifically, HDViT first develops a heterogeneous architecture design that is composed of multiple processing engines (PEs) to accelerate various operators in the ViT model. Then, HDViT devises a hybrid-oriented dataflow mapping strategy to reduce data transmission granularity and alleviate storage resource pressure. Lastly, to achieve the latency balancing among multiple PEs, we formulate the HDViT architecture and implement an automated exploration process to identify optimized parallelism parameters that satisfy computation and storage demands while enhancing the heterogeneous architectural performance. Experimental results indicate that HDViT achieves significant performance speedups of 2.16<inline-formula><tex-math>$times$</tex-math></inline-formula> and 3.51<inline-formula><tex-math>$times$</tex-math></inline-formula> compared to previous heterogeneous and unified accelerators, respectively. HDViT also achieves a maximum of 98.46% hardware utilization.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 4","pages":"1224-1238"},"PeriodicalIF":3.6,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Context-Awareness and Hardware-Friendly Sparse Matrix Multiplication Kernel for CNN Inference Acceleration
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-12-16 DOI: 10.1109/TC.2024.3517745
Haotian Wang;Yan Ding;Yumeng Liu;Weichen Liu;Chubo Liu;Wangdong Yang;Kenli Li
{"title":"A Context-Awareness and Hardware-Friendly Sparse Matrix Multiplication Kernel for CNN Inference Acceleration","authors":"Haotian Wang;Yan Ding;Yumeng Liu;Weichen Liu;Chubo Liu;Wangdong Yang;Kenli Li","doi":"10.1109/TC.2024.3517745","DOIUrl":"https://doi.org/10.1109/TC.2024.3517745","url":null,"abstract":"Sparsification technology is crucial for deploying convolutional neural networks in resource-constrained environments. However, the efficiency of sparse models is hampered by irregular memory access patterns in sparse matrix multiplication kernels. Hardware-level support for 2:4 granularity in sparse tensor cores presents an opportunity for designing efficient sparse matrix multiplication kernels. Existing approaches often involve adjusting sparse structures or secondary sparsification, introducing additional computational errors. To tackle this challenge, we introduce a flexible 2:4 structured adaptive sparse matrix multiplication (FS-AMM) method, a hardware-friendly sparse matrix multiplication kernel that leverages model context to accelerate convolutional neural networks. First, we propose a model context-aware matrix pre-processing method that employs heuristic algorithms to estimate a loss of accuracy due to weight sparsity at each layer. Second, we design a hardware-friendly sparse storage format that combines 2:4 sparse and dense storage formats, enabling more versatile sparsity ratio selection. Third, we implement efficient matrix multiplication kernels to optimize GPU utilization. Finally, experimental results on A100 GPUs show that our method effectively utilizes the sparse tensor kernel and obtains an average 3.09 times speedup ratio compared to other sparse methods while maintaining a high accuracy.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 4","pages":"1182-1195"},"PeriodicalIF":3.6,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Data-Centric Software-Hardware Co-Designed Architecture for Large-Scale Graph Processing
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-12-09 DOI: 10.1109/TC.2024.3514292
Zerun Li;Xiaoming Chen;Yuxin Yang;Feng Min;Xiaoyu Zhang;Yinhe Han
{"title":"A Data-Centric Software-Hardware Co-Designed Architecture for Large-Scale Graph Processing","authors":"Zerun Li;Xiaoming Chen;Yuxin Yang;Feng Min;Xiaoyu Zhang;Yinhe Han","doi":"10.1109/TC.2024.3514292","DOIUrl":"https://doi.org/10.1109/TC.2024.3514292","url":null,"abstract":"Graph processing plays an important role in many practical applications. However, the inherent characteristics of graph processing, including random memory access and the low computation-to-communication ratio, make it difficult to efficiently execute on traditional computing architectures, such as CPUs and GPUs. Near-memory computing has the characteristics of low latency and high bandwidth. It is widely regarded as a promising direction for designing graph processing accelerators. However, the storage space of a single device cannot meet the demand of large-scale graph processing. Using multiple devices will bring lots of inter-device data transmission, which may counteract the benefits of near-memory computing. To fundamentally reduce the data transmission overhead, we propose a data-centric graph processing framework for systems with multiple near-memory computing devices. The framework uses a data-centric programming model as the software hardware interface. For software, we propose an optimized data flow and a heuristic multi-step weighted maximum matching algorithm to achieve efficient inter-device communication and ensure load balancing. For hardware, we design a data reuse driven task controller and a data type-aware on-chip memory, which can effectively improve the utilization of the on-chip memory. Compared with the two most recent near-memory graph accelerators, our framework significantly reduces energy consumption and inter-device communication.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 4","pages":"1109-1122"},"PeriodicalIF":3.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protecting the CCSDS 123.0-B-2 Compression Algorithm Against Single-Event Upsets for Space Applications
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-12-09 DOI: 10.1109/TC.2024.3512203
Daniel Báscones;Francisco García-Herrero;Óscar Ruano;Carlos González;Daniel Mozos;Juan Antonio Maestro
{"title":"Protecting the CCSDS 123.0-B-2 Compression Algorithm Against Single-Event Upsets for Space Applications","authors":"Daniel Báscones;Francisco García-Herrero;Óscar Ruano;Carlos González;Daniel Mozos;Juan Antonio Maestro","doi":"10.1109/TC.2024.3512203","DOIUrl":"https://doi.org/10.1109/TC.2024.3512203","url":null,"abstract":"Hyperspectral imaging is an excellent tool to remotely analyze the Earth from in-orbit devices. Satellites capture these images containing vast information about the ground pixels. To optimize storage and transmission speeds, compression is often performed onboard the satellite. To that end, algorithms such as the CCSDS 123.0-B-2 are implemented on FPGAs, enabling this process in an efficient and fast manner. Single-Event Upsets (SEU) are commonplace in this scenario, e.g. bit flips in the FPGA’s configuration memory which can catastrophically alter the algorithm’s output. In this paper, we propose a fault tolerance technique for this specific case. The compression core is checked periodically by running a golden model designed to excite the full internal datapath based on a synthetic image. A failure in this check will trigger a reconfiguration of the compression core. Results show better detection rates than Dual Modular Redundancy (DMR) at a fraction of the resource cost, proving this technique as a viable alternative. Furthermore, other algorithms with similar processing flows might benefit as well from this technique.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 3","pages":"944-954"},"PeriodicalIF":3.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10785575","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Optimal Customized Architecture for Heterogeneous Federated Learning With Contrastive Cloud-Edge Model Decoupling
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-12-09 DOI: 10.1109/TC.2024.3514302
Xingyan Chen;Tian Du;Mu Wang;Tiancheng Gu;Yu Zhao;Gang Kou;Changqiao Xu;Dapeng Oliver Wu
{"title":"Towards Optimal Customized Architecture for Heterogeneous Federated Learning With Contrastive Cloud-Edge Model Decoupling","authors":"Xingyan Chen;Tian Du;Mu Wang;Tiancheng Gu;Yu Zhao;Gang Kou;Changqiao Xu;Dapeng Oliver Wu","doi":"10.1109/TC.2024.3514302","DOIUrl":"https://doi.org/10.1109/TC.2024.3514302","url":null,"abstract":"Federated learning, as a promising distributed learning paradigm, enables collaborative training of a global model across multiple network edge clients without the need for central data collecting. However, the heterogeneity of edge data distribution drags the model towards the local minima, which can be distant from the global optimum. Such heterogeneity often leads to slow convergence and substantial communication overhead. To address these issues, we propose a novel federated learning framework called <monospace>FedCMD</monospace>, a model decoupling tailored to the Cloud-edge supported federated learning that separates deep neural networks into a body for capturing shared representations in Cloud and a personalized head for migrating data heterogeneity. Our motivation is that, by the deep investigation of the performance of selecting different neural network layers as the personalized head, we found rigidly assigning the last layer as the personalized head in current studies is not always optimal. Instead, it is necessary to dynamically select the personalized layer that maximizes the training performance by taking the representation difference between neighbor layers into account. To find the optimal personalized layer, we utilize the low-dimensional representation of each layer to contrast feature distribution transfer and introduce a Wasserstein-based layer selection method, aimed at identifying the best-match layer for personalization. Additionally, a weighted global aggregation algorithm is proposed based on the selected personalized layer for the practical application of <monospace>FedCMD</monospace>. Extensive experiments on ten benchmarks demonstrate the efficiency and superior performance of our solution compared with nine state-of-the-art solutions. All code and results are available at <uri>https://github.com/elegy112138/FedCMD</uri>.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 4","pages":"1123-1137"},"PeriodicalIF":3.6,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143611755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Sharing in the Metaverse With Key Abuse Resistance Based on Decentralized CP-ABE
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-12-05 DOI: 10.1109/TC.2024.3512177
Liang Zhang;Zhanrong Ou;Changhui Hu;Haibin Kan;Jiheng Zhang
{"title":"Data Sharing in the Metaverse With Key Abuse Resistance Based on Decentralized CP-ABE","authors":"Liang Zhang;Zhanrong Ou;Changhui Hu;Haibin Kan;Jiheng Zhang","doi":"10.1109/TC.2024.3512177","DOIUrl":"https://doi.org/10.1109/TC.2024.3512177","url":null,"abstract":"Data sharing is ubiquitous in the metaverse, which adopts blockchain as its foundation. Blockchain is employed because it enables data transparency, achieves tamper resistance, and supports smart contracts. However, securely sharing data based on blockchain necessitates further consideration. Ciphertext-policy attribute-based encryption (CP-ABE) is a promising primitive to provide confidentiality and fine-grained access control. Nonetheless, authority accountability and key abuse are critical issues that practical applications must address. Few studies have considered CP-ABE key confidentiality and authority accountability simultaneously. To our knowledge, we are the first to fill this gap by integrating non-interactive zero-knowledge (NIZK) proofs into CP-ABE keys and outsourcing the verification process to a smart contract. To meet the decentralization requirement, we incorporate a decentralized CP-ABE scheme into the proposed data sharing system. Additionally, we provide an implementation based on smart contract to determine whether an access control policy is satisfied by a set of CP-ABE keys. We also introduce an open incentive mechanism to encourage honest participation in data sharing. Hence, the key abuse issue is resolved through the NIZK proof and the incentive mechanism. We provide a theoretical analysis and conduct comprehensive experiments to demonstrate the feasibility and efficiency of the data sharing system. Based on the proposed accountable approach, we further illustrate an application in GameFi, where players can play to earn or contribute to an accountable DAO, fostering a thriving metaverse ecosystem.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 3","pages":"901-914"},"PeriodicalIF":3.6,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Radix Generalized Hyperbolic CORDIC and Its Hardware Implementation
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-12-04 DOI: 10.1109/TC.2024.3512183
Hui Chen;Lianghua Quan;Ke Chen;Weiqiang Liu
{"title":"High-Radix Generalized Hyperbolic CORDIC and Its Hardware Implementation","authors":"Hui Chen;Lianghua Quan;Ke Chen;Weiqiang Liu","doi":"10.1109/TC.2024.3512183","DOIUrl":"https://doi.org/10.1109/TC.2024.3512183","url":null,"abstract":"In this paper, we propose a high-radix generalized hyperbolic coordinate rotation digital computer (HGH-CORDIC). This algorithm not only computes logarithmic and exponential functions with any fixed base but also significantly reduces the number of iterations required compared to traditional CORDIC methods. Initially, we present the general iteration formulas for HGH-CORDIC. Subsequently, we discuss its pivotal convergence properties and selection criteria, exemplifying these with commonly used cases. Through extensive software simulations, we validate the theoretical foundations of our approach. Finally, we explore efficient hardware implementation strategies. Our analysis indicates that, relative to state-of-the-art radix-2 GH-CORDIC, the proposed HGH-CORDIC can decrease the number of iterations by more than <inline-formula><tex-math>$50%$</tex-math></inline-formula> while maintaining comparable accuracy. Synthesized under the 28nm CMOS technology, the reports show that the reference circuit can save about <inline-formula><tex-math>$40%$</tex-math></inline-formula> area and power consumption averagely for <inline-formula><tex-math>$2^{x}$</tex-math></inline-formula> and <inline-formula><tex-math>$log_{2}x$</tex-math></inline-formula> calculations compared with the latest CORDIC method.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 3","pages":"983-995"},"PeriodicalIF":3.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信