基于高效机器视觉的图像传感器架构下的内存3d堆叠计算

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers Pub Date : 2025-04-08 DOI:10.1109/TC.2025.3558068

Lixia Han;Yiyang Chen;Siyuan Chen;Haozhang Yang;Ao Shi;Guihai Yu;Jiaqi Li;Zheng Zhou;Yijiao Wang;Yanzhi Wang;Xiaoyan Liu;Jinfeng Kang;Peng Huang

{"title":"基于高效机器视觉的图像传感器架构下的内存3d堆叠计算","authors":"Lixia Han;Yiyang Chen;Siyuan Chen;Haozhang Yang;Ao Shi;Guihai Yu;Jiaqi Li;Zheng Zhou;Yijiao Wang;Yanzhi Wang;Xiaoyan Liu;Jinfeng Kang;Peng Huang","doi":"10.1109/TC.2025.3558068","DOIUrl":null,"url":null,"abstract":"Computational image sensors with CNN processing capabilities are emerging to alleviate the energy-intensive and time-consuming data movement between sensors and external processors. However, deploying CNN models onto these computational image sensors faces challenges from the limited on-chip memory resources and insufficient image processing throughput. This work proposes a 3D-stacked NAND flash-based computing-in-memory under image sensor architecture (CIMUS) to facilitate the complete deployment of CNN model. To fully leverage the potential of high bandwidth from the 3D-stacked integration, we design a novel distributed CNN mapping and dataflow to process the full focal plane image in parallel, which senses and recognizes ImageNet tasks with >1000fps. To tackle the computational error of inputs “0” in 3D NAND flash-based CIM, we propose an input-independent offset compensation method, which reduces the average vector-matrix multiplication (VMM) error by 48%. Evaluation results indicate that CIMUS architecture achieves a 9.8× improvement in CNN inference speed and a 33× boost in energy efficiency compared to the state-of-the-art computational image sensor in the ImageNet recognition task.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 7","pages":"2321-2333"},"PeriodicalIF":3.8000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CIMUS: 3D-Stacked Computing-in-Memory Under Image Sensor Architecture for Efficient Machine Vision\",\"authors\":\"Lixia Han;Yiyang Chen;Siyuan Chen;Haozhang Yang;Ao Shi;Guihai Yu;Jiaqi Li;Zheng Zhou;Yijiao Wang;Yanzhi Wang;Xiaoyan Liu;Jinfeng Kang;Peng Huang\",\"doi\":\"10.1109/TC.2025.3558068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computational image sensors with CNN processing capabilities are emerging to alleviate the energy-intensive and time-consuming data movement between sensors and external processors. However, deploying CNN models onto these computational image sensors faces challenges from the limited on-chip memory resources and insufficient image processing throughput. This work proposes a 3D-stacked NAND flash-based computing-in-memory under image sensor architecture (CIMUS) to facilitate the complete deployment of CNN model. To fully leverage the potential of high bandwidth from the 3D-stacked integration, we design a novel distributed CNN mapping and dataflow to process the full focal plane image in parallel, which senses and recognizes ImageNet tasks with >1000fps. To tackle the computational error of inputs “0” in 3D NAND flash-based CIM, we propose an input-independent offset compensation method, which reduces the average vector-matrix multiplication (VMM) error by 48%. Evaluation results indicate that CIMUS architecture achieves a 9.8× improvement in CNN inference speed and a 33× boost in energy efficiency compared to the state-of-the-art computational image sensor in the ImageNet recognition task.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"74 7\",\"pages\":\"2321-2333\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10949700/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10949700/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

具有CNN处理能力的计算图像传感器正在兴起，以减轻传感器与外部处理器之间能量密集型和耗时的数据传输。然而，将CNN模型部署到这些计算图像传感器上面临着片上存储资源有限和图像处理吞吐量不足的挑战。本工作提出了一种基于3d堆叠NAND闪存的图像传感器架构（CIMUS）下的内存计算，以促进CNN模型的完整部署。为了充分利用3d堆叠集成的高带宽潜力，我们设计了一种新的分布式CNN映射和数据流来并行处理全焦平面图像，以>1000fps的速度感知和识别ImageNet任务。为了解决基于3D NAND闪存的CIM中输入“0”的计算误差，提出了一种与输入无关的偏移补偿方法，该方法将平均向量矩阵乘法（VMM）误差降低了48%。评估结果表明，在ImageNet识别任务中，与最先进的计算图像传感器相比，CIMUS架构在CNN推理速度上提高了9.8倍，在能源效率上提高了33倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CIMUS: 3D-Stacked Computing-in-Memory Under Image Sensor Architecture for Efficient Machine Vision

Computational image sensors with CNN processing capabilities are emerging to alleviate the energy-intensive and time-consuming data movement between sensors and external processors. However, deploying CNN models onto these computational image sensors faces challenges from the limited on-chip memory resources and insufficient image processing throughput. This work proposes a 3D-stacked NAND flash-based computing-in-memory under image sensor architecture (CIMUS) to facilitate the complete deployment of CNN model. To fully leverage the potential of high bandwidth from the 3D-stacked integration, we design a novel distributed CNN mapping and dataflow to process the full focal plane image in parallel, which senses and recognizes ImageNet tasks with >1000fps. To tackle the computational error of inputs “0” in 3D NAND flash-based CIM, we propose an input-independent offset compensation method, which reduces the average vector-matrix multiplication (VMM) error by 48%. Evaluation results indicate that CIMUS architecture achieves a 9.8× improvement in CNN inference speed and a 33× boost in energy efficiency compared to the state-of-the-art computational image sensor in the ImageNet recognition task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Computers 工程技术-工程：电子与电气

CiteScore

6.60

自引率

5.40%

发文量

199

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.