{"title":"Exploiting parallelism for bioinformatics data analysis applications by data transformation graph","authors":"Zhenchun Huang, Yang Gu, XiaoXuan Bai","doi":"10.1109/BMEI.2015.7401595","DOIUrl":null,"url":null,"abstract":"Bioinformatics applications which are both data-intensive and computation-intensive bring great challenges to their development and optimization. In order to study and accelerate bioinformatics data analysis models, a method named data transformation graph (DTG) is introduced first. It describes scientific data analysis models by dependencies and transformations among their data items. Then, taking BLAST as an example, DTG is used to study the data dependency in this popular bioinformatics data analysis model and parallel it by both query splitting and database partition. At last, parallel versions of BLAST proposed by DTG are implemented based on a distributed data-intensive computing middleware called Robinia. The result of performance test shows that parallel BLAST can achieve near-linear speedup with good scalability, and data transformation graph can be used to study, parallelize and optimize bioinformatics analysis applications for higher performance.","PeriodicalId":119361,"journal":{"name":"2015 8th International Conference on Biomedical Engineering and Informatics (BMEI)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 8th International Conference on Biomedical Engineering and Informatics (BMEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BMEI.2015.7401595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Bioinformatics applications which are both data-intensive and computation-intensive bring great challenges to their development and optimization. In order to study and accelerate bioinformatics data analysis models, a method named data transformation graph (DTG) is introduced first. It describes scientific data analysis models by dependencies and transformations among their data items. Then, taking BLAST as an example, DTG is used to study the data dependency in this popular bioinformatics data analysis model and parallel it by both query splitting and database partition. At last, parallel versions of BLAST proposed by DTG are implemented based on a distributed data-intensive computing middleware called Robinia. The result of performance test shows that parallel BLAST can achieve near-linear speedup with good scalability, and data transformation graph can be used to study, parallelize and optimize bioinformatics analysis applications for higher performance.