LUT-Based Convolutional Tsetlin Machine Accelerator With Dynamic Clause Scaling for Resources-Constrained FPGAs

IF 2.7 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2026-01-01 Epub Date: 2026-03-23 DOI:10.1109/JXCDC.2026.3676833

Rashed Al Amin;Roman Obermaisser

{"title":"LUT-Based Convolutional Tsetlin Machine Accelerator With Dynamic Clause Scaling for Resources-Constrained FPGAs","authors":"Rashed Al Amin;Roman Obermaisser","doi":"10.1109/JXCDC.2026.3676833","DOIUrl":null,"url":null,"abstract":"The rapid growth of machine learning (ML) workloads, particularly in computer vision applications, has significantly increased computational and energy demands in modern electronic systems, motivating the use of hardware accelerators to offload processing from general-purpose processors. Despite advances in computationally efficient ML models, achieving energy-efficient inference on resource-constrained edge devices remains a significant challenge. The Tsetlin machine (TM) has emerged as an attractive alternative for image classification due to its high throughput and inherently energy-efficient learning paradigm. However, existing TM-based hardware accelerators struggle to balance classification accuracy and energy efficiency, limiting their practical deployment at the edge. This article presents a resource- and energy-efficient convolutional TM (CTM) accelerator with dynamic clause scaling, optimized explicitly for edge field-programmable gate array (FPGA) platforms. The proposed architecture employs LUT-based pipelining and targeted resource-optimization techniques to minimize FPGA resource utilization while maintaining high-energy efficiency and performance. The accelerator is implemented on a Xilinx Zybo-Z20 FPGA and evaluated using the MNIST, Fashion-MNIST (FMNIST), and Kuzushiji-MNIST (KMNIST) datasets, achieving classification accuracies of 97.78%, 85.53%, and 88.54%, respectively, with an energy consumption of up to <inline-formula> <tex-math>$0.3~\\mu $ </tex-math></inline-formula>J per image classification. Compared with state-of-the-art CTM accelerators, the proposed design achieves up to <inline-formula> <tex-math>$40\\times $ </tex-math></inline-formula> improvements in resource and energy efficiency, demonstrating its suitability for real-time image and pattern classification on edge FPGA-based systems.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"12 ","pages":"45-55"},"PeriodicalIF":2.7000,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11454583","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11454583/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/3/23 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid growth of machine learning (ML) workloads, particularly in computer vision applications, has significantly increased computational and energy demands in modern electronic systems, motivating the use of hardware accelerators to offload processing from general-purpose processors. Despite advances in computationally efficient ML models, achieving energy-efficient inference on resource-constrained edge devices remains a significant challenge. The Tsetlin machine (TM) has emerged as an attractive alternative for image classification due to its high throughput and inherently energy-efficient learning paradigm. However, existing TM-based hardware accelerators struggle to balance classification accuracy and energy efficiency, limiting their practical deployment at the edge. This article presents a resource- and energy-efficient convolutional TM (CTM) accelerator with dynamic clause scaling, optimized explicitly for edge field-programmable gate array (FPGA) platforms. The proposed architecture employs LUT-based pipelining and targeted resource-optimization techniques to minimize FPGA resource utilization while maintaining high-energy efficiency and performance. The accelerator is implemented on a Xilinx Zybo-Z20 FPGA and evaluated using the MNIST, Fashion-MNIST (FMNIST), and Kuzushiji-MNIST (KMNIST) datasets, achieving classification accuracies of 97.78%, 85.53%, and 88.54%, respectively, with an energy consumption of up to

$0.3~\mu $

J per image classification. Compared with state-of-the-art CTM accelerators, the proposed design achieves up to

$40\times $

improvements in resource and energy efficiency, demonstrating its suitability for real-time image and pattern classification on edge FPGA-based systems.

查看原文本刊更多论文

基于lut的动态子句缩放卷积Tsetlin机器加速器在资源受限fpga中的应用

机器学习（ML）工作负载的快速增长，特别是在计算机视觉应用中，大大增加了现代电子系统的计算和能源需求，促使使用硬件加速器来减轻通用处理器的处理负担。尽管计算效率高的机器学习模型取得了进步，但在资源受限的边缘设备上实现节能推理仍然是一个重大挑战。Tsetlin机器（TM）由于其高吞吐量和固有的节能学习范式而成为一种有吸引力的图像分类替代方案。然而，现有的基于机器学习的硬件加速器难以平衡分类精度和能源效率，限制了它们在边缘的实际部署。本文提出了一种具有动态子句缩放功能的资源高效卷积TM （CTM）加速器，并针对边缘现场可编程门阵列（FPGA）平台进行了优化。所提出的架构采用基于lut的流水线和有针对性的资源优化技术，以最大限度地减少FPGA资源利用率，同时保持高能效和性能。该加速器在Xilinx Zybo-Z20 FPGA上实现，并使用MNIST、Fashion-MNIST （FMNIST）和Kuzushiji-MNIST （KMNIST）数据集进行评估，分类准确率分别达到97.78%、85.53%和88.54%，每张图像分类能耗高达0.3~\mu $ J。与最先进的CTM加速器相比，所提出的设计在资源和能源效率方面实现了高达40倍的改进，证明了其适用于基于边缘fpga的系统的实时图像和模式分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊