{"title":"基于高效机器视觉的图像传感器架构下的内存3d堆叠计算","authors":"Lixia Han;Yiyang Chen;Siyuan Chen;Haozhang Yang;Ao Shi;Guihai Yu;Jiaqi Li;Zheng Zhou;Yijiao Wang;Yanzhi Wang;Xiaoyan Liu;Jinfeng Kang;Peng Huang","doi":"10.1109/TC.2025.3558068","DOIUrl":null,"url":null,"abstract":"Computational image sensors with CNN processing capabilities are emerging to alleviate the energy-intensive and time-consuming data movement between sensors and external processors. However, deploying CNN models onto these computational image sensors faces challenges from the limited on-chip memory resources and insufficient image processing throughput. This work proposes a 3D-stacked NAND flash-based computing-in-memory under image sensor architecture (CIMUS) to facilitate the complete deployment of CNN model. To fully leverage the potential of high bandwidth from the 3D-stacked integration, we design a novel distributed CNN mapping and dataflow to process the full focal plane image in parallel, which senses and recognizes ImageNet tasks with >1000fps. To tackle the computational error of inputs “0” in 3D NAND flash-based CIM, we propose an input-independent offset compensation method, which reduces the average vector-matrix multiplication (VMM) error by 48%. Evaluation results indicate that CIMUS architecture achieves a 9.8× improvement in CNN inference speed and a 33× boost in energy efficiency compared to the state-of-the-art computational image sensor in the ImageNet recognition task.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 7","pages":"2321-2333"},"PeriodicalIF":3.8000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CIMUS: 3D-Stacked Computing-in-Memory Under Image Sensor Architecture for Efficient Machine Vision\",\"authors\":\"Lixia Han;Yiyang Chen;Siyuan Chen;Haozhang Yang;Ao Shi;Guihai Yu;Jiaqi Li;Zheng Zhou;Yijiao Wang;Yanzhi Wang;Xiaoyan Liu;Jinfeng Kang;Peng Huang\",\"doi\":\"10.1109/TC.2025.3558068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computational image sensors with CNN processing capabilities are emerging to alleviate the energy-intensive and time-consuming data movement between sensors and external processors. However, deploying CNN models onto these computational image sensors faces challenges from the limited on-chip memory resources and insufficient image processing throughput. This work proposes a 3D-stacked NAND flash-based computing-in-memory under image sensor architecture (CIMUS) to facilitate the complete deployment of CNN model. To fully leverage the potential of high bandwidth from the 3D-stacked integration, we design a novel distributed CNN mapping and dataflow to process the full focal plane image in parallel, which senses and recognizes ImageNet tasks with >1000fps. To tackle the computational error of inputs “0” in 3D NAND flash-based CIM, we propose an input-independent offset compensation method, which reduces the average vector-matrix multiplication (VMM) error by 48%. Evaluation results indicate that CIMUS architecture achieves a 9.8× improvement in CNN inference speed and a 33× boost in energy efficiency compared to the state-of-the-art computational image sensor in the ImageNet recognition task.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"74 7\",\"pages\":\"2321-2333\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10949700/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10949700/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
CIMUS: 3D-Stacked Computing-in-Memory Under Image Sensor Architecture for Efficient Machine Vision
Computational image sensors with CNN processing capabilities are emerging to alleviate the energy-intensive and time-consuming data movement between sensors and external processors. However, deploying CNN models onto these computational image sensors faces challenges from the limited on-chip memory resources and insufficient image processing throughput. This work proposes a 3D-stacked NAND flash-based computing-in-memory under image sensor architecture (CIMUS) to facilitate the complete deployment of CNN model. To fully leverage the potential of high bandwidth from the 3D-stacked integration, we design a novel distributed CNN mapping and dataflow to process the full focal plane image in parallel, which senses and recognizes ImageNet tasks with >1000fps. To tackle the computational error of inputs “0” in 3D NAND flash-based CIM, we propose an input-independent offset compensation method, which reduces the average vector-matrix multiplication (VMM) error by 48%. Evaluation results indicate that CIMUS architecture achieves a 9.8× improvement in CNN inference speed and a 33× boost in energy efficiency compared to the state-of-the-art computational image sensor in the ImageNet recognition task.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.