{"title":"利用数据转换图开发生物信息学数据分析应用的并行性","authors":"Zhenchun Huang, Yang Gu, XiaoXuan Bai","doi":"10.1109/BMEI.2015.7401595","DOIUrl":null,"url":null,"abstract":"Bioinformatics applications which are both data-intensive and computation-intensive bring great challenges to their development and optimization. In order to study and accelerate bioinformatics data analysis models, a method named data transformation graph (DTG) is introduced first. It describes scientific data analysis models by dependencies and transformations among their data items. Then, taking BLAST as an example, DTG is used to study the data dependency in this popular bioinformatics data analysis model and parallel it by both query splitting and database partition. At last, parallel versions of BLAST proposed by DTG are implemented based on a distributed data-intensive computing middleware called Robinia. The result of performance test shows that parallel BLAST can achieve near-linear speedup with good scalability, and data transformation graph can be used to study, parallelize and optimize bioinformatics analysis applications for higher performance.","PeriodicalId":119361,"journal":{"name":"2015 8th International Conference on Biomedical Engineering and Informatics (BMEI)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploiting parallelism for bioinformatics data analysis applications by data transformation graph\",\"authors\":\"Zhenchun Huang, Yang Gu, XiaoXuan Bai\",\"doi\":\"10.1109/BMEI.2015.7401595\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bioinformatics applications which are both data-intensive and computation-intensive bring great challenges to their development and optimization. In order to study and accelerate bioinformatics data analysis models, a method named data transformation graph (DTG) is introduced first. It describes scientific data analysis models by dependencies and transformations among their data items. Then, taking BLAST as an example, DTG is used to study the data dependency in this popular bioinformatics data analysis model and parallel it by both query splitting and database partition. At last, parallel versions of BLAST proposed by DTG are implemented based on a distributed data-intensive computing middleware called Robinia. The result of performance test shows that parallel BLAST can achieve near-linear speedup with good scalability, and data transformation graph can be used to study, parallelize and optimize bioinformatics analysis applications for higher performance.\",\"PeriodicalId\":119361,\"journal\":{\"name\":\"2015 8th International Conference on Biomedical Engineering and Informatics (BMEI)\",\"volume\":\"133 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 8th International Conference on Biomedical Engineering and Informatics (BMEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BMEI.2015.7401595\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 8th International Conference on Biomedical Engineering and Informatics (BMEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BMEI.2015.7401595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploiting parallelism for bioinformatics data analysis applications by data transformation graph
Bioinformatics applications which are both data-intensive and computation-intensive bring great challenges to their development and optimization. In order to study and accelerate bioinformatics data analysis models, a method named data transformation graph (DTG) is introduced first. It describes scientific data analysis models by dependencies and transformations among their data items. Then, taking BLAST as an example, DTG is used to study the data dependency in this popular bioinformatics data analysis model and parallel it by both query splitting and database partition. At last, parallel versions of BLAST proposed by DTG are implemented based on a distributed data-intensive computing middleware called Robinia. The result of performance test shows that parallel BLAST can achieve near-linear speedup with good scalability, and data transformation graph can be used to study, parallelize and optimize bioinformatics analysis applications for higher performance.