gpu上的热带半环多矩阵乘积库:(不只是)迈向RNA-RNA相互作用计算的一步

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI:10.1109/IPDPSW50202.2020.00037

Brandon Gildemaster, P. Ghalsasi, S. Rajopadhye

{"title":"gpu上的热带半环多矩阵乘积库:(不只是)迈向RNA-RNA相互作用计算的一步","authors":"Brandon Gildemaster, P. Ghalsasi, S. Rajopadhye","doi":"10.1109/IPDPSW50202.2020.00037","DOIUrl":null,"url":null,"abstract":"RNA-RNA interaction (RRI) is important in processes such as gene regulation, and certain classes of RRI are known to play roles in various diseases including cancer and Alzheimer’s. Other classes are not as well studied but could have biological importance, thus there is a need for highthroughput tools which enable the study of these molecules. Current computational tools for RRI are slow: execution times in days, weeks or even months for large experiments, because the algorithms have time and space complexity, respectively $\\mathrm {O}( N^{3}M^{3})$ and $\\mathrm {O}( N^{2}M^{2})$, for two sequences length $N$ and $M$. No GPU parallelization of such algorithms exists. We show how the most computationally expensive portion of RRI base pair maximization algorithms, an $\\mathrm {O}( NM ) ^{3}$ computation, can be expressed as $\\mathrm {O}( N^{3})$ instances of such matrix products. We therefore propose an optimized library for the core computation of BPMax, an RRI algorithm based on weighted base pair counting. Our library multiplies multiple pairs of matrices in the max-plus semiring. We explore multiple tradeoffs: a square matrix product library attains close to the machine peak, but does 6-fold unnecessary computations and has $\\mathrm {a}2 \\times $ higher data footprint, while the one with the minimum work and memory footprint has thread divergence and unbalanced load. We also specialize for upper banded (trapezoidal shaped) matrices, which are relevant to a windowed version of the algorithm. STOP PRESS: just before we submitted the camera-ready version of the paper, we incorporated our library into a GPU implementation of the complete BPMax algorithm. We will report performance numbers at the workshop.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Tropical Semiring Multiple Matrix-Product Library on GPUs: (not just) a step towards RNA-RNA Interaction Computations\",\"authors\":\"Brandon Gildemaster, P. Ghalsasi, S. Rajopadhye\",\"doi\":\"10.1109/IPDPSW50202.2020.00037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RNA-RNA interaction (RRI) is important in processes such as gene regulation, and certain classes of RRI are known to play roles in various diseases including cancer and Alzheimer’s. Other classes are not as well studied but could have biological importance, thus there is a need for highthroughput tools which enable the study of these molecules. Current computational tools for RRI are slow: execution times in days, weeks or even months for large experiments, because the algorithms have time and space complexity, respectively $\\\\mathrm {O}( N^{3}M^{3})$ and $\\\\mathrm {O}( N^{2}M^{2})$, for two sequences length $N$ and $M$. No GPU parallelization of such algorithms exists. We show how the most computationally expensive portion of RRI base pair maximization algorithms, an $\\\\mathrm {O}( NM ) ^{3}$ computation, can be expressed as $\\\\mathrm {O}( N^{3})$ instances of such matrix products. We therefore propose an optimized library for the core computation of BPMax, an RRI algorithm based on weighted base pair counting. Our library multiplies multiple pairs of matrices in the max-plus semiring. We explore multiple tradeoffs: a square matrix product library attains close to the machine peak, but does 6-fold unnecessary computations and has $\\\\mathrm {a}2 \\\\times $ higher data footprint, while the one with the minimum work and memory footprint has thread divergence and unbalanced load. We also specialize for upper banded (trapezoidal shaped) matrices, which are relevant to a windowed version of the algorithm. STOP PRESS: just before we submitted the camera-ready version of the paper, we incorporated our library into a GPU implementation of the complete BPMax algorithm. We will report performance numbers at the workshop.\",\"PeriodicalId\":398819,\"journal\":{\"name\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW50202.2020.00037\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW50202.2020.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

RNA-RNA相互作用(RRI)在基因调控等过程中很重要，已知某些类型的RRI在包括癌症和阿尔茨海默病在内的各种疾病中发挥作用。其他种类的分子没有得到很好的研究，但可能具有生物学意义，因此需要高通量工具来研究这些分子。目前用于RRI的计算工具很慢:大型实验的执行时间为几天，几周甚至几个月，因为算法具有时间和空间复杂性，分别为$\mathrm {O}(N^{3}M^{3})$和$\mathrm {O}(N^{2}M^{2})$，对于两个序列长度$N$和$M$。这种算法不存在GPU并行化。我们展示了RRI碱基对最大化算法中计算成本最高的部分，即$\ mathm {O}(NM) ^{3}$计算，如何可以表示为$\ mathm {O}(N^{3})$这种矩阵乘积的实例。因此，我们提出了一个优化库，用于BPMax的核心计算，这是一种基于加权碱基对计数的RRI算法。我们的库在max-plus半环中对多个矩阵对进行乘法运算。我们探索了多种权衡:一个方矩阵乘积库接近机器峰值，但做了6倍的不必要计算，并且有2倍的数据占用，而具有最小工作和内存占用的库有线程发散和负载不平衡。我们还专门研究上带状(梯形)矩阵，这与算法的窗口版本相关。STOP PRESS:就在我们提交论文的相机准备版本之前，我们将我们的库合并到完整BPMax算法的GPU实现中。我们将在研讨会上报告业绩数字。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Tropical Semiring Multiple Matrix-Product Library on GPUs: (not just) a step towards RNA-RNA Interaction Computations

RNA-RNA interaction (RRI) is important in processes such as gene regulation, and certain classes of RRI are known to play roles in various diseases including cancer and Alzheimer’s. Other classes are not as well studied but could have biological importance, thus there is a need for highthroughput tools which enable the study of these molecules. Current computational tools for RRI are slow: execution times in days, weeks or even months for large experiments, because the algorithms have time and space complexity, respectively $\mathrm {O}( N^{3}M^{3})$ and $\mathrm {O}( N^{2}M^{2})$, for two sequences length $N$ and $M$. No GPU parallelization of such algorithms exists. We show how the most computationally expensive portion of RRI base pair maximization algorithms, an $\mathrm {O}( NM ) ^{3}$ computation, can be expressed as $\mathrm {O}( N^{3})$ instances of such matrix products. We therefore propose an optimized library for the core computation of BPMax, an RRI algorithm based on weighted base pair counting. Our library multiplies multiple pairs of matrices in the max-plus semiring. We explore multiple tradeoffs: a square matrix product library attains close to the machine peak, but does 6-fold unnecessary computations and has $\mathrm {a}2 \times $ higher data footprint, while the one with the minimum work and memory footprint has thread divergence and unbalanced load. We also specialize for upper banded (trapezoidal shaped) matrices, which are relevant to a windowed version of the algorithm. STOP PRESS: just before we submitted the camera-ready version of the paper, we incorporated our library into a GPU implementation of the complete BPMax algorithm. We will report performance numbers at the workshop.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量