Kcollections:面向K-mers的快速高效库

M. Fujimoto, Cole A. Lyman, M. Clement
{"title":"Kcollections:面向K-mers的快速高效库","authors":"M. Fujimoto, Cole A. Lyman, M. Clement","doi":"10.1109/IPDPSW50202.2020.00041","DOIUrl":null,"url":null,"abstract":"K-mers form the backbone of many bioinformatic algorithms. They are, however, difficult to store and use efficiently because the number of k-mers increases exponentially as $k$ increases. Many algorithms exist for compressed storage of kmers but suffer from slow insert times or are probabilistic resulting in false-positive k-mers. Furthermore, k-mer libraries usually specialize in associating specific values with k-mers such as a color in colored de Bruijn Graphs or k-mer count. We present kcollections1, a compressed and parallel data structure designed for k-mers generated from whole, assembled genomes. Kcollections is available for $\\mathrm {C}++$ and provides set-and maplike structures as well as a k-mer counting data structure all of which utilize parallel operations designed using a MapReduce paradigm. Additionally, we provide basic Python bindings for rapid prototyping. Kcollections makes developing bioinformatic algorithms simpler by abstracting away the tedious task of storing k-mers.1https://www.github.com/masakistan/kcollections","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Kcollections: A Fast and Efficient Library for K-mers\",\"authors\":\"M. Fujimoto, Cole A. Lyman, M. Clement\",\"doi\":\"10.1109/IPDPSW50202.2020.00041\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"K-mers form the backbone of many bioinformatic algorithms. They are, however, difficult to store and use efficiently because the number of k-mers increases exponentially as $k$ increases. Many algorithms exist for compressed storage of kmers but suffer from slow insert times or are probabilistic resulting in false-positive k-mers. Furthermore, k-mer libraries usually specialize in associating specific values with k-mers such as a color in colored de Bruijn Graphs or k-mer count. We present kcollections1, a compressed and parallel data structure designed for k-mers generated from whole, assembled genomes. Kcollections is available for $\\\\mathrm {C}++$ and provides set-and maplike structures as well as a k-mer counting data structure all of which utilize parallel operations designed using a MapReduce paradigm. Additionally, we provide basic Python bindings for rapid prototyping. Kcollections makes developing bioinformatic algorithms simpler by abstracting away the tedious task of storing k-mers.1https://www.github.com/masakistan/kcollections\",\"PeriodicalId\":398819,\"journal\":{\"name\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW50202.2020.00041\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW50202.2020.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

K-mers构成了许多生物信息学算法的主干。然而,它们很难有效地存储和使用,因为k-mers的数量随着k$的增加呈指数增长。存在许多用于压缩k-mers存储的算法,但它们的插入时间很慢,或者很可能导致k-mers假阳性。此外,k-mer库通常专门用于将特定值与k-mer相关联,例如彩色de Bruijn图中的颜色或k-mer计数。我们提出了kcollection1,这是一种压缩的并行数据结构,专为从整个组装基因组中产生的k-mers而设计。Kcollections可以在$\ mathm {C}++$中使用,它提供了类似set和map的结构以及k-mer计数数据结构,所有这些都利用了使用MapReduce范式设计的并行操作。此外,我们还提供了用于快速原型的基本Python绑定。Kcollections抽象了存储k-mers的繁琐任务,使开发生物信息学算法变得更加简单
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Kcollections: A Fast and Efficient Library for K-mers
K-mers form the backbone of many bioinformatic algorithms. They are, however, difficult to store and use efficiently because the number of k-mers increases exponentially as $k$ increases. Many algorithms exist for compressed storage of kmers but suffer from slow insert times or are probabilistic resulting in false-positive k-mers. Furthermore, k-mer libraries usually specialize in associating specific values with k-mers such as a color in colored de Bruijn Graphs or k-mer count. We present kcollections1, a compressed and parallel data structure designed for k-mers generated from whole, assembled genomes. Kcollections is available for $\mathrm {C}++$ and provides set-and maplike structures as well as a k-mer counting data structure all of which utilize parallel operations designed using a MapReduce paradigm. Additionally, we provide basic Python bindings for rapid prototyping. Kcollections makes developing bioinformatic algorithms simpler by abstracting away the tedious task of storing k-mers.1https://www.github.com/masakistan/kcollections
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信