{"title":"FPonAP: Implementation of Floating Point Operations on Associative Processors","authors":"Walaa Amer;Mariam Rakka;Fadi Kurdahi","doi":"10.1109/LES.2024.3446912","DOIUrl":null,"url":null,"abstract":"The associative processor (AP) is a processing in-memory (PIM) platform that avoids data movement between the memory and the processor by running computations directly in the memory. It is a parallel architecture based on content addressable memory (CAM), allowing it to address data by its content and thus accelerating search and pattern recognition tasks. APs are suggested as a promising solution to the memory wall caused by the data movement bottleneck in traditional Von-Neumann architectures for data-driven applications, such as machine learning. However, modern implementations of the AP still lack support for floating point (FP) operations that are heavily used in the target applications. In this letter, we present a novel implementation of FP operations on the AP and evaluate its performance on the levels of latency and energy, showing that the proposed solution outperforms parallel FP execution on central processing unit and even GPU for large vector sizes.","PeriodicalId":56143,"journal":{"name":"IEEE Embedded Systems Letters","volume":"16 4","pages":"389-392"},"PeriodicalIF":1.7000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Embedded Systems Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10779982/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
The associative processor (AP) is a processing in-memory (PIM) platform that avoids data movement between the memory and the processor by running computations directly in the memory. It is a parallel architecture based on content addressable memory (CAM), allowing it to address data by its content and thus accelerating search and pattern recognition tasks. APs are suggested as a promising solution to the memory wall caused by the data movement bottleneck in traditional Von-Neumann architectures for data-driven applications, such as machine learning. However, modern implementations of the AP still lack support for floating point (FP) operations that are heavily used in the target applications. In this letter, we present a novel implementation of FP operations on the AP and evaluate its performance on the levels of latency and energy, showing that the proposed solution outperforms parallel FP execution on central processing unit and even GPU for large vector sizes.
期刊介绍:
The IEEE Embedded Systems Letters (ESL), provides a forum for rapid dissemination of latest technical advances in embedded systems and related areas in embedded software. The emphasis is on models, methods, and tools that ensure secure, correct, efficient and robust design of embedded systems and their applications.