{"title":"gpu上的热带半环多矩阵乘积库:(不只是)迈向RNA-RNA相互作用计算的一步","authors":"Brandon Gildemaster, P. Ghalsasi, S. Rajopadhye","doi":"10.1109/IPDPSW50202.2020.00037","DOIUrl":null,"url":null,"abstract":"RNA-RNA interaction (RRI) is important in processes such as gene regulation, and certain classes of RRI are known to play roles in various diseases including cancer and Alzheimer’s. Other classes are not as well studied but could have biological importance, thus there is a need for highthroughput tools which enable the study of these molecules. Current computational tools for RRI are slow: execution times in days, weeks or even months for large experiments, because the algorithms have time and space complexity, respectively $\\mathrm {O}( N^{3}M^{3})$ and $\\mathrm {O}( N^{2}M^{2})$, for two sequences length $N$ and $M$. No GPU parallelization of such algorithms exists. We show how the most computationally expensive portion of RRI base pair maximization algorithms, an $\\mathrm {O}( NM ) ^{3}$ computation, can be expressed as $\\mathrm {O}( N^{3})$ instances of such matrix products. We therefore propose an optimized library for the core computation of BPMax, an RRI algorithm based on weighted base pair counting. Our library multiplies multiple pairs of matrices in the max-plus semiring. We explore multiple tradeoffs: a square matrix product library attains close to the machine peak, but does 6-fold unnecessary computations and has $\\mathrm {a}2 \\times $ higher data footprint, while the one with the minimum work and memory footprint has thread divergence and unbalanced load. We also specialize for upper banded (trapezoidal shaped) matrices, which are relevant to a windowed version of the algorithm. STOP PRESS: just before we submitted the camera-ready version of the paper, we incorporated our library into a GPU implementation of the complete BPMax algorithm. We will report performance numbers at the workshop.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Tropical Semiring Multiple Matrix-Product Library on GPUs: (not just) a step towards RNA-RNA Interaction Computations\",\"authors\":\"Brandon Gildemaster, P. Ghalsasi, S. Rajopadhye\",\"doi\":\"10.1109/IPDPSW50202.2020.00037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RNA-RNA interaction (RRI) is important in processes such as gene regulation, and certain classes of RRI are known to play roles in various diseases including cancer and Alzheimer’s. Other classes are not as well studied but could have biological importance, thus there is a need for highthroughput tools which enable the study of these molecules. Current computational tools for RRI are slow: execution times in days, weeks or even months for large experiments, because the algorithms have time and space complexity, respectively $\\\\mathrm {O}( N^{3}M^{3})$ and $\\\\mathrm {O}( N^{2}M^{2})$, for two sequences length $N$ and $M$. No GPU parallelization of such algorithms exists. We show how the most computationally expensive portion of RRI base pair maximization algorithms, an $\\\\mathrm {O}( NM ) ^{3}$ computation, can be expressed as $\\\\mathrm {O}( N^{3})$ instances of such matrix products. We therefore propose an optimized library for the core computation of BPMax, an RRI algorithm based on weighted base pair counting. Our library multiplies multiple pairs of matrices in the max-plus semiring. We explore multiple tradeoffs: a square matrix product library attains close to the machine peak, but does 6-fold unnecessary computations and has $\\\\mathrm {a}2 \\\\times $ higher data footprint, while the one with the minimum work and memory footprint has thread divergence and unbalanced load. We also specialize for upper banded (trapezoidal shaped) matrices, which are relevant to a windowed version of the algorithm. STOP PRESS: just before we submitted the camera-ready version of the paper, we incorporated our library into a GPU implementation of the complete BPMax algorithm. We will report performance numbers at the workshop.\",\"PeriodicalId\":398819,\"journal\":{\"name\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW50202.2020.00037\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW50202.2020.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Tropical Semiring Multiple Matrix-Product Library on GPUs: (not just) a step towards RNA-RNA Interaction Computations
RNA-RNA interaction (RRI) is important in processes such as gene regulation, and certain classes of RRI are known to play roles in various diseases including cancer and Alzheimer’s. Other classes are not as well studied but could have biological importance, thus there is a need for highthroughput tools which enable the study of these molecules. Current computational tools for RRI are slow: execution times in days, weeks or even months for large experiments, because the algorithms have time and space complexity, respectively $\mathrm {O}( N^{3}M^{3})$ and $\mathrm {O}( N^{2}M^{2})$, for two sequences length $N$ and $M$. No GPU parallelization of such algorithms exists. We show how the most computationally expensive portion of RRI base pair maximization algorithms, an $\mathrm {O}( NM ) ^{3}$ computation, can be expressed as $\mathrm {O}( N^{3})$ instances of such matrix products. We therefore propose an optimized library for the core computation of BPMax, an RRI algorithm based on weighted base pair counting. Our library multiplies multiple pairs of matrices in the max-plus semiring. We explore multiple tradeoffs: a square matrix product library attains close to the machine peak, but does 6-fold unnecessary computations and has $\mathrm {a}2 \times $ higher data footprint, while the one with the minimum work and memory footprint has thread divergence and unbalanced load. We also specialize for upper banded (trapezoidal shaped) matrices, which are relevant to a windowed version of the algorithm. STOP PRESS: just before we submitted the camera-ready version of the paper, we incorporated our library into a GPU implementation of the complete BPMax algorithm. We will report performance numbers at the workshop.