CiMBA：通过内存中计算的设备上基调用加速基因组测序

IF 6 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-03-12 DOI:10.1109/TPDS.2025.3550811

William Andrew Simon;Irem Boybat;Riselda Kodra;Elena Ferro;Gagandeep Singh;Mohammed Alser;Shubham Jain;Hsinyu Tsai;Geoffrey W. Burr;Onur Mutlu;Abu Sebastian

{"title":"CiMBA：通过内存中计算的设备上基调用加速基因组测序","authors":"William Andrew Simon;Irem Boybat;Riselda Kodra;Elena Ferro;Gagandeep Singh;Mohammed Alser;Shubham Jain;Hsinyu Tsai;Geoffrey W. Burr;Onur Mutlu;Abu Sebastian","doi":"10.1109/TPDS.2025.3550811","DOIUrl":null,"url":null,"abstract":"As genome sequencing is finding utility in a wide variety of domains beyond the confines of traditional medical settings, its computational pipeline faces two significant challenges. First, the creation of up to 0.5 GB of data per minute imposes substantial communication and storage overheads. Second, the sequencing pipeline is bottlenecked at the basecalling step, consuming >40% of genome analysis time. A range of proposals have attempted to address these challenges, with limited success. We propose to address these challenges with a Compute-in-Memory Basecalling Accelerator (CiMBA), the first embedded (<inline-formula><tex-math>$\\sim 25$</tex-math></inline-formula> mm<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>) accelerator capable of real-time, on-device basecalling, coupled with AnaLog (AL)-Dorado, a new family of analog focused basecalling DNNs. Our resulting hardware/software co-design greatly reduces data communication overhead, is capable of a throughput of 4.77 million bases per second, 24× that required for real-time operation, and achieves 17 × /27× power/area efficiency over the best prior basecalling embedded accelerator while maintaining a high accuracy comparable to state-of-the-art software basecallers.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 6","pages":"1130-1145"},"PeriodicalIF":6.0000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CiMBA: Accelerating Genome Sequencing Through On-Device Basecalling via Compute-in-Memory\",\"authors\":\"William Andrew Simon;Irem Boybat;Riselda Kodra;Elena Ferro;Gagandeep Singh;Mohammed Alser;Shubham Jain;Hsinyu Tsai;Geoffrey W. Burr;Onur Mutlu;Abu Sebastian\",\"doi\":\"10.1109/TPDS.2025.3550811\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As genome sequencing is finding utility in a wide variety of domains beyond the confines of traditional medical settings, its computational pipeline faces two significant challenges. First, the creation of up to 0.5 GB of data per minute imposes substantial communication and storage overheads. Second, the sequencing pipeline is bottlenecked at the basecalling step, consuming >40% of genome analysis time. A range of proposals have attempted to address these challenges, with limited success. We propose to address these challenges with a Compute-in-Memory Basecalling Accelerator (CiMBA), the first embedded (<inline-formula><tex-math>$\\\\sim 25$</tex-math></inline-formula> mm<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>) accelerator capable of real-time, on-device basecalling, coupled with AnaLog (AL)-Dorado, a new family of analog focused basecalling DNNs. Our resulting hardware/software co-design greatly reduces data communication overhead, is capable of a throughput of 4.77 million bases per second, 24× that required for real-time operation, and achieves 17 × /27× power/area efficiency over the best prior basecalling embedded accelerator while maintaining a high accuracy comparable to state-of-the-art software basecallers.\",\"PeriodicalId\":13257,\"journal\":{\"name\":\"IEEE Transactions on Parallel and Distributed Systems\",\"volume\":\"36 6\",\"pages\":\"1130-1145\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Parallel and Distributed Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10924297/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Parallel and Distributed Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10924297/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

随着基因组测序在传统医疗环境限制之外的广泛领域中得到应用，其计算管道面临两个重大挑战。首先，每分钟创建多达0.5 GB的数据会增加大量的通信和存储开销。其次，测序流水线在调用碱基阶段存在瓶颈，占用了约40%的基因组分析时间。一系列提案试图解决这些挑战，但收效甚微。我们建议使用内存中计算基调用加速器（CiMBA）来解决这些挑战，这是第一个能够实时，设备上基调用的嵌入式加速器（$\sim 25$ mm$^{2}$），再加上AnaLog (AL)-Dorado，这是一个新的模拟基调用dnn家族。由此产生的硬件/软件协同设计大大降低了数据通信开销，每秒能够实现477万个碱基的吞吐量，是实时操作所需的24倍，并且比最佳的先验基调用嵌入式加速器实现17 × /27×的功率/面积效率，同时保持与最先进的软件基调用器相当的高精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CiMBA: Accelerating Genome Sequencing Through On-Device Basecalling via Compute-in-Memory

As genome sequencing is finding utility in a wide variety of domains beyond the confines of traditional medical settings, its computational pipeline faces two significant challenges. First, the creation of up to 0.5 GB of data per minute imposes substantial communication and storage overheads. Second, the sequencing pipeline is bottlenecked at the basecalling step, consuming >40% of genome analysis time. A range of proposals have attempted to address these challenges, with limited success. We propose to address these challenges with a Compute-in-Memory Basecalling Accelerator (CiMBA), the first embedded (

$\sim 25$

$^{2}$

) accelerator capable of real-time, on-device basecalling, coupled with AnaLog (AL)-Dorado, a new family of analog focused basecalling DNNs. Our resulting hardware/software co-design greatly reduces data communication overhead, is capable of a throughput of 4.77 million bases per second, 24× that required for real-time operation, and achieves 17 × /27× power/area efficiency over the best prior basecalling embedded accelerator while maintaining a high accuracy comparable to state-of-the-art software basecallers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Parallel and Distributed Systems 工程技术-工程：电子与电气

CiteScore

11.00

自引率

9.40%

发文量

281

审稿时长

5.6 months

期刊介绍： IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers. Particular areas of interest include, but are not limited to: a) Parallel and distributed algorithms, focusing on topics such as: models of computation; numerical, combinatorial, and data-intensive parallel algorithms, scalability of algorithms and data structures for parallel and distributed systems, communication and synchronization protocols, network algorithms, scheduling, and load balancing. b) Applications of parallel and distributed computing, including computational and data-enabled science and engineering, big data applications, parallel crowd sourcing, large-scale social network analysis, management of big data, cloud and grid computing, scientific and biomedical applications, mobile computing, and cyber-physical systems. c) Parallel and distributed architectures, including architectures for instruction-level and thread-level parallelism; design, analysis, implementation, fault resilience and performance measurements of multiple-processor systems; multicore processors, heterogeneous many-core systems; petascale and exascale systems designs; novel big data architectures; special purpose architectures, including graphics processors, signal processors, network processors, media accelerators, and other special purpose processors and accelerators; impact of technology on architecture; network and interconnect architectures; parallel I/O and storage systems; architecture of the memory hierarchy; power-efficient and green computing architectures; dependable architectures; and performance modeling and evaluation. d) Parallel and distributed software, including parallel and multicore programming languages and compilers, runtime systems, operating systems, Internet computing and web services, resource management including green computing, middleware for grids, clouds, and data centers, libraries, performance modeling and evaluation, parallel programming paradigms, and programming environments and tools.