Convolutional Tsetlin Machine-based Training and Inference Accelerator for 2-D Pattern Classification

IF 1.9 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Microprocessors and Microsystems Pub Date : 2023-10-07 DOI:10.1016/j.micpro.2023.104949

Svein Anders Tunheim , Lei Jiao , Rishad Shafik , Alex Yakovlev , Ole-Christoffer Granmo

{"title":"Convolutional Tsetlin Machine-based Training and Inference Accelerator for 2-D Pattern Classification","authors":"Svein Anders Tunheim , Lei Jiao , Rishad Shafik , Alex Yakovlev , Ole-Christoffer Granmo","doi":"10.1016/j.micpro.2023.104949","DOIUrl":null,"url":null,"abstract":"<div><p>The Tsetlin Machine (TM) is a machine learning algorithm based on an ensemble of Tsetlin Automata (TAs) that learns propositional logic expressions from Boolean input features. In this paper, the design and implementation of a Field Programmable Gate Array (FPGA) accelerator based on the Convolutional Tsetlin Machine (CTM) is presented. The accelerator performs classification of two pattern classes in 4 × 4 Boolean images with a 2 × 2 convolution window. Specifically, there are two separate TMs, one per class. Each TM comprises 40 propositional logic formulas, denoted as clauses, which are conjunctions of literals. Include/exclude actions from the TAs determine which literals are included in each clause. The accelerator supports full training, including random patch selection during convolution based on parallel reservoir sampling across all clauses. The design is implemented on a Xilinx Zynq XC7Z020 FPGA platform. With an operating clock speed of 40 MHz, the accelerator achieves a classification rate of 4.4 million images per second with an energy per classification of 0.6 <span><math><mi>μ</mi></math></span>J. The mean test accuracy is 99.9% when trained on the 2-dimensional Noisy XOR dataset with 40% noise in the training labels. To achieve this performance, which is on par with the original software implementation, Linear Feedback Shift Register (LFSR) random number generators of minimum 16 bits are required. The solution demonstrates the core principles of a CTM and can be scaled to operate on multi-class systems for larger images.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"103 ","pages":"Article 104949"},"PeriodicalIF":1.9000,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microprocessors and Microsystems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S014193312300193X","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The Tsetlin Machine (TM) is a machine learning algorithm based on an ensemble of Tsetlin Automata (TAs) that learns propositional logic expressions from Boolean input features. In this paper, the design and implementation of a Field Programmable Gate Array (FPGA) accelerator based on the Convolutional Tsetlin Machine (CTM) is presented. The accelerator performs classification of two pattern classes in 4 × 4 Boolean images with a 2 × 2 convolution window. Specifically, there are two separate TMs, one per class. Each TM comprises 40 propositional logic formulas, denoted as clauses, which are conjunctions of literals. Include/exclude actions from the TAs determine which literals are included in each clause. The accelerator supports full training, including random patch selection during convolution based on parallel reservoir sampling across all clauses. The design is implemented on a Xilinx Zynq XC7Z020 FPGA platform. With an operating clock speed of 40 MHz, the accelerator achieves a classification rate of 4.4 million images per second with an energy per classification of 0.6 $μ$ J. The mean test accuracy is 99.9% when trained on the 2-dimensional Noisy XOR dataset with 40% noise in the training labels. To achieve this performance, which is on par with the original software implementation, Linear Feedback Shift Register (LFSR) random number generators of minimum 16 bits are required. The solution demonstrates the core principles of a CTM and can be scaled to operate on multi-class systems for larger images.

查看原文本刊更多论文

基于卷积Tsetlin机器的二维模式分类训练与推理加速器

Tsetlin机器（TM）是一种基于Tsetlin自动机（TA）集合的机器学习算法，它从布尔输入特征中学习命题逻辑表达式。本文介绍了一种基于卷积Tsetlin机（CTM）的现场可编程门阵列（FPGA）加速器的设计与实现。加速器利用2×2卷积窗口对4×4布尔图像中的两个模式类进行分类。具体来说，有两个独立的TM，每个类一个。每个TM包含40个命题逻辑公式，表示为子句，它们是文字的连词。TA中的包含/排除操作决定了每个子句中包含哪些文字。加速器支持完全训练，包括在卷积期间基于所有子句的并行储层采样的随机补丁选择。该设计是在Xilinx Zynq XC7Z020 FPGA平台上实现的。在40 MHz的工作时钟速度下，加速器实现了每秒440万张图像的分类率，每次分类的能量为0.6μJ。当在训练标签中具有40%噪声的二维噪声XOR数据集上训练时，平均测试准确率为99.9%。为了实现与原始软件实现相同的性能，需要至少16位的线性反馈移位寄存器（LFSR）随机数生成器。该解决方案展示了CTM的核心原理，并且可以扩展为在多类系统上操作以获得更大的图像。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Microprocessors and Microsystems 工程技术-工程：电子与电气

CiteScore

6.90

自引率

3.80%

发文量

204

审稿时长

172 days

期刊介绍： Microprocessors and Microsystems: Embedded Hardware Design (MICPRO) is a journal covering all design and architectural aspects related to embedded systems hardware. This includes different embedded system hardware platforms ranging from custom hardware via reconfigurable systems and application specific processors to general purpose embedded processors. Special emphasis is put on novel complex embedded architectures, such as systems on chip (SoC), systems on a programmable/reconfigurable chip (SoPC) and multi-processor systems on a chip (MPSoC), as well as, their memory and communication methods and structures, such as network-on-chip (NoC). Design automation of such systems including methodologies, techniques, flows and tools for their design, as well as, novel designs of hardware components fall within the scope of this journal. Novel cyber-physical applications that use embedded systems are also central in this journal. While software is not in the main focus of this journal, methods of hardware/software co-design, as well as, application restructuring and mapping to embedded hardware platforms, that consider interplay between software and hardware components with emphasis on hardware, are also in the journal scope.