Darwin: A Genomics Co-processor Provides up to 15,000X Acceleration on Long Read Assembly

Yatish Turakhia, G. Bejerano, W. Dally
{"title":"Darwin: A Genomics Co-processor Provides up to 15,000X Acceleration on Long Read Assembly","authors":"Yatish Turakhia, G. Bejerano, W. Dally","doi":"10.1145/3173162.3173193","DOIUrl":null,"url":null,"abstract":"Genomics is transforming medicine and our understanding of life in fundamental ways. Genomics data, however, is far outpacing Moore»s Law. Third-generation sequencing technologies produce 100X longer reads than second generation technologies and reveal a much broader mutation spectrum of disease and evolution. However, these technologies incur prohibitively high computational costs. Over 1,300 CPU hours are required for reference-guided assembly of the human genome, and over 15,600 CPU hours are required for de novo assembly. This paper describes \"Darwin\" --- a co-processor for genomic sequence alignment that, without sacrificing sensitivity, provides up to $15,000X speedup over the state-of-the-art software for reference-guided assembly of third-generation reads. Darwin achieves this speedup through hardware/algorithm co-design, trading more easily accelerated alignment for less memory-intensive filtering, and by optimizing the memory system for filtering. Darwin combines a hardware-accelerated version of D-SOFT, a novel filtering algorithm, alignment at high speed, and with a hardware-accelerated version of GACT, a novel alignment algorithm. GACT generates near-optimal alignments of arbitrarily long genomic sequences using constant memory for the compute-intensive step. Darwin is adaptable, with tunable speed and sensitivity to match emerging sequencing technologies and to meet the requirements of genomic applications beyond read assembly.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"91","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3173162.3173193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 91

Abstract

Genomics is transforming medicine and our understanding of life in fundamental ways. Genomics data, however, is far outpacing Moore»s Law. Third-generation sequencing technologies produce 100X longer reads than second generation technologies and reveal a much broader mutation spectrum of disease and evolution. However, these technologies incur prohibitively high computational costs. Over 1,300 CPU hours are required for reference-guided assembly of the human genome, and over 15,600 CPU hours are required for de novo assembly. This paper describes "Darwin" --- a co-processor for genomic sequence alignment that, without sacrificing sensitivity, provides up to $15,000X speedup over the state-of-the-art software for reference-guided assembly of third-generation reads. Darwin achieves this speedup through hardware/algorithm co-design, trading more easily accelerated alignment for less memory-intensive filtering, and by optimizing the memory system for filtering. Darwin combines a hardware-accelerated version of D-SOFT, a novel filtering algorithm, alignment at high speed, and with a hardware-accelerated version of GACT, a novel alignment algorithm. GACT generates near-optimal alignments of arbitrarily long genomic sequences using constant memory for the compute-intensive step. Darwin is adaptable, with tunable speed and sensitivity to match emerging sequencing technologies and to meet the requirements of genomic applications beyond read assembly.
达尔文:基因组学协处理器在长读汇编上提供高达15000倍的加速
基因组学正在从根本上改变医学和我们对生命的理解。然而,基因组学的数据远远超过了摩尔定律。第三代测序技术的读取长度是第二代测序技术的100倍,并揭示了更广泛的疾病和进化突变谱。然而,这些技术产生了过高的计算成本。参考指导的人类基因组组装需要超过1300个CPU小时,而从头组装需要超过15600个CPU小时。本文描述了“达尔文”——一种用于基因组序列比对的协同处理器,在不牺牲灵敏度的情况下,比最先进的软件提供高达15,000倍的加速,用于第三代reads的参考引导组装。达尔文通过硬件/算法协同设计实现了这种加速,以更容易的加速对齐换取更少的内存密集型过滤,并通过优化内存系统进行过滤。Darwin结合了硬件加速版的D-SOFT,一种新的滤波算法,高速对齐,以及硬件加速版的GACT,一种新的对齐算法。GACT为计算密集型步骤使用恒定内存生成任意长基因组序列的近最佳比对。达尔文是适应性强的,具有可调的速度和灵敏度,以匹配新兴的测序技术,并满足基因组应用超出读取组装的要求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信